Community forum

Please note that VisualCron support is not actively monitoring this community forum. Please use our contact page for contacting the VisualCron support directly.


jrtwynam
2019-12-12T14:12:19Z
Hi,

I have a situation where I need to break up a file into multiple smaller files. What I'm doing is taking an Excel file and converting that to a CSV file, reading that CSV file and passing the entire contents to an Oracle script via a job variable. Since I'm not a DBA, I don't have access to compile this script as a function in the database and use VC to call the function using the job variable, which Oracle would then interpret as a Bind variable. What I'm doing is passing the CSV string as a simple string variable, so I'm limited to 4000 characters. The problem arises because the Excel file doesn't have a limit on the size - it could be 30k characters, or maybe even more.

I found VC's task "Concatenate Files", which is the exact opposite of what I want to do. I'm wondering if there's an easy way to split a file. E.g. say my initial Excel file is 1000 records - instead of converting that to 1 CSV file with 1000 records (which is easy with VC), what I want to do is convert it to 5 CSV files with 200 records each. Given the structure of each record, limit the files to 200 records will guarantee that I'll always be less than 4000 characters, so then it's a simple matter of looping through these smaller files and processing them one by one.

I don't see a built in task that will do this, but maybe I missed it because it's called something I wasn't expecting. Any ideas? I suppose as a last resort, I could write some C#.net code to do it, but I was hoping to avoid that if possible.


Thanks.
Sponsor
Forum information
al355
2019-12-12T14:20:09Z
There isn't a built in task to do this but you could do it via a Loop although it would be quite slow.
jrtwynam
2019-12-12T14:29:11Z
Yeah that's why I'm trying to pass a CSV string to Oracle - if I simply have VC loop through a file with 10000 records (or even just 1000), it would be far too slow. I tested it with a file containing 350 records (manually created, so I knew beforehand that it was less than 4000 characters). Having VC do the loop took about 6 minutes, but having Oracle do the loop took 7 seconds.

I see a bunch of string variables in VC, but I don't see one that says "take the first 200 lines" (or, alternatively, "take everything up to the 200th index of a CrLf"). If I could find something like that, then I could pass the first 200 lines to Oracle to process, use String.Remove to remove the first 200 lines from the original string (it would be highly unlikely that those 200 lines would be repeated elsewhere in the string), then take the first 200 lines again (which at this point would actually be the next 200 lines), and so on until I reached the end of the string. So if the initial file was 1000 records, this would be a VC loop with 5 iterations, with each VC iteration having an Oracle loop with 200 iterations.
Gary_W
2019-12-12T17:53:04Z
Can you make use of Oracle's external tables feature? Basically you set up a folder and a file as a table. You drop a file in there and can access it like a table.
2nd choice, use sqlldr to load the file into a staging table and work with it from there in Oracle.
jrtwynam
2019-12-12T18:17:48Z
I would love to be able to do that, and technically I probably have access to do it (in the sense that I have write access to the database), but I don't have permission from our DBA team to do it. A lot of our IT policies leave me a bit handcuffed in terms of potential solutions to problems.
Gary_W
2019-12-12T18:40:55Z
Eyeroll (but been there). Turn it over to the "DBA"s then. They'll soon get sick of loading tables and set something up for you. 🙂
jrtwynam
2019-12-12T18:51:08Z
Yeah, I've tried that in the past as well. The usual answer is something like "you want to do what? Oh, well that's not the approved way of doing that. You need to do this, this, and this, and that'll require these forms to be filled out and approved before we can assign resources to your request". Quite honestly, it's usually a lot easier and faster for me to figure it out myself by stringing together a combination of methods that I am allowed to do, even though it might not be the most efficient and/or elegant solution to the problem.

I think for this particular problem, what I'll probably end up doing is looping through a list of CSV files, and for each file, passing it to a helper job (since nested loops aren't supported in VC). This helper job would contain some C#.net code that would return the file in chunks of 200 records (or whatever number I pass it as a parameter) and write each chunk to some sort of parameter path. Then the original job would loop through these smaller files and load them into Oracle. Not elegant by any stretch of the imagination, but it'd work.
thomas
2019-12-13T07:55:22Z
You can split up a file with powershell quite easily. I should say I'm no good at powershell, I just stole the code from here: https://stackoverflow.co...xt-file-using-powershell  (not the top answer)

$i=0; Get-Content C:\Temp\partition.txt -ReadCount 50000 | %{$i++; $_ | Out-File C:\Temp\split_$i.txt}

the input file is C:\Temp\partition.txt. In this example it splits into files with max 50000 lines. The result files are called split_i.txt. I tested with a file containing 160000 records, and it took a couple of seconds

Capture.PNG

If you want the c# way, I have some code lying around that splits up an IEnumerable into equal chunks, if you want it.
jrtwynam
2019-12-13T13:53:24Z
Thanks! That looks like exactly what I need, and it makes it a lot easier than writing a sub-job with C# code. I'll have to give that a try.



EDIT: I just tried it, and it worked absolutely beautifully. 😁
thomas
2019-12-13T15:22:21Z
Duncan Weishaar
2021-08-09T05:37:41Z
Applying the rules of file structure can be a handy tool for performing this operation
Generally, in most subsets of an application or service design, the overall operations are designated as input and set up for processing. Input includes all files stored on a local system or on external sources.

The simplest method is by size - find out how large each input file is and split it at that point (within reason). However, this rule has limitations- if you have very large files stored in multiple folders OR lots of smaller files occupying one folder (e.g., 1/2 GB worth of 100 MB documents), then you would exceed

Concatenation can be done in multiple ways, such as through printing and pasting the contents from many separate documents or spreadsheets into a single document, or by using software like Microsoft Excel to combine data on different sheets of a workbook. A list of resources for working with concatenation in MS Excel can be found here :
Support
2021-08-09T10:10:22Z
Originally Posted by: Duncan Weishaar 

Applying the rules of file structure can be a handy tool for performing this operation
Generally, in most subsets of an application or service design, the overall operations are designated as input and set up for processing. Input includes all files stored on a local system or on external sources.

The simplest method is by size - find out how large each input file is and split it at that point (within reason). However, this rule has limitations- if you have very large files stored in multiple folders OR lots of smaller files occupying one folder (e.g., 1/2 GB worth of 100 MB documents), then you would exceed

Concatenation can be done in multiple ways, such as through printing and pasting the contents from many separate documents or spreadsheets into a single document, or by using software like Microsoft Excel to combine data on different sheets of a workbook. A list of resources for working with concatenation in MS Excel can be found here :



Thanks for your input and the source.

Michael
Support
http://www.visualcron.com 

Please like  VisualCron on facebook!
Scroll to Top