Bug?: Loop over file list slow when many files - VisualCron

Please note that VisualCron support is not actively monitoring this community forum. Please use our contact page for contacting the VisualCron support directly.

Welcome Guest! To enable all features please Login or Register.

Jon Tofte-Hansen
Paid support Topic Starter

2019-01-14T11:19:24Z

Hi

We have a supplier that delivers a variable number of files from time to time (size: 2-4k). These files are put in a folder, and a list file task lists the files. A loop then iterates the list with a read file taks, an insert database task and then a copy file task to a processed folder. As long as the file count is below a couple of thousands this works fine. But already at around 5000 files the copy task (which is also the last task of the loop) takes increasingly long time. Last week we got ~70.000 files and the copy tasks suddenly took more thant 10 seconds per file!

I may be able to make a hack by only listing the first 1000 files and then after the loop check for files in the directory and jump back to the list file task with a flow but this should not be necessary.

The fact that a few thousand files work well must mean that the host/filesystem/antivirus are not the culprits.

VisualCron 8.4.0.

Edit: I tried the hack I suggest but the result is not much better. The problem it seems is the time involved copying from a folder with many files. It is strange though that the read file task who looks into the same folder can return the text of the file in less than VisualCron measures (00:00:00,0).

Edited by user 2019-01-14T14:17:18Z | Reason: Not specified

Sponsor

Forum information

Wanna join the discussion?! Login to your Forum forum account or Register a new forum account

Gary_W
Free support

2019-01-14T14:55:47Z

Just curious, Is your trigger to process all these files time-based or event based? Is it possible files are still being copied in from the vendor when the job is running? If you are sure no files could be added during the process, maybe try copying as it's own task after the main processing loop?
Open the task manager and watch memory usage during this process. Might be interesting although I wouldn't expect that to be an issue during copying, but maybe it's not being released during the read files part of the loop which is having a negative impact?

Try doing the copy after the main loop as an OS command instead of inside of VC one by one? Hopefully this would help identify where the bottleneck is. I'm sure curious to know your findings!

Edited by user 2019-01-14T14:58:31Z | Reason: Not specified

Jon Tofte-Hansen
Paid support Topic Starter

2019-01-14T15:06:30Z

Hi Gary

Thank you for the comments:

We download a zip-file, unpack it, and process in the above mentioned loop. The job is triggered by the zip being released at a remote site, but there is only one triggering file = 1 run and no conflicts there.

Memory usage is stable at ~2,5G out of 8G during all tasks. It is a dual CPU virtual Windows host and CPU fluctuates between ~3% and ~50%. I am running the job right now and only very few other jobs are run at this hour.

Forgot to mention that I of course could pull the copy outside the loop (and may very well end up doing just that), but as the file content is not always correct errors occur. When the processed file is archived inside the loop it is very easy to remove/correct and start the job again. If the files were copied after the loop and say the error occured on file 35.000 out of 70.000 I would have to move the 35.000 processed files manually before restarting the job. Even though I order the output there would still be a risk that I move an unprocessed file.

Edited by user 2019-01-14T15:27:38Z | Reason: Not specified

thordw
Paid support

2019-01-15T12:11:57Z

Hi

I think it could have something to do with how the copy task works. Since it requires a file mask for each copy.
So say you have 70000 files, that means that each time you run the copy task it will filter out the one file you need among all 70000.

Try doing a simple #net or powershell that does the file moving for you instead of the copy task.
It should increase performance.