Bug?: Loop over file list slow when many files - VisualCron - Forum

Community forum

Jon Tofte-Hansen
2019-01-14T11:19:24Z
Hi

We have a supplier that delivers a variable number of files from time to time (size: 2-4k). These files are put in a folder, and a list file task lists the files. A loop then iterates the list with a read file taks, an insert database task and then a copy file task to a processed folder. As long as the file count is below a couple of thousands this works fine. But already at around 5000 files the copy task (which is also the last task of the loop) takes increasingly long time. Last week we got ~70.000 files and the copy tasks suddenly took more thant 10 seconds per file!

I may be able to make a hack by only listing the first 1000 files and then after the loop check for files in the directory and jump back to the list file task with a flow but this should not be necessary.

The fact that a few thousand files work well must mean that the host/filesystem/antivirus are not the culprits.

VisualCron 8.4.0.


Edit: I tried the hack I suggest but the result is not much better. The problem it seems is the time involved copying from a folder with many files. It is strange though that the read file task who looks into the same folder can return the text of the file in less than VisualCron measures (00:00:00,0).
Gary_W
2019-01-14T14:55:47Z
Just curious, Is your trigger to process all these files time-based or event based? Is it possible files are still being copied in from the vendor when the job is running? If you are sure no files could be added during the process, maybe try copying as it's own task after the main processing loop?
Open the task manager and watch memory usage during this process. Might be interesting although I wouldn't expect that to be an issue during copying, but maybe it's not being released during the read files part of the loop which is having a negative impact?

Try doing the copy after the main loop as an OS command instead of inside of VC one by one? Hopefully this would help identify where the bottleneck is. I'm sure curious to know your findings!
Jon Tofte-Hansen
2019-01-14T15:06:30Z
Hi Gary

Thank you for the comments:

We download a zip-file, unpack it, and process in the above mentioned loop. The job is triggered by the zip being released at a remote site, but there is only one triggering file = 1 run and no conflicts there.

Memory usage is stable at ~2,5G out of 8G during all tasks. It is a dual CPU virtual Windows host and CPU fluctuates between ~3% and ~50%. I am running the job right now and only very few other jobs are run at this hour.

Forgot to mention that I of course could pull the copy outside the loop (and may very well end up doing just that), but as the file content is not always correct errors occur. When the processed file is archived inside the loop it is very easy to remove/correct and start the job again. If the files were copied after the loop and say the error occured on file 35.000 out of 70.000 I would have to move the 35.000 processed files manually before restarting the job. Even though I order the output there would still be a risk that I move an unprocessed file.
thordw
2019-01-15T12:11:57Z
Hi

I think it could have something to do with how the copy task works. Since it requires a file mask for each copy.
So say you have 70000 files, that means that each time you run the copy task it will filter out the one file you need among all 70000.

Try doing a simple #net or powershell that does the file moving for you instead of the copy task.
It should increase performance.
Scroll to Top