File Task - Count number of lines - VisualCron - Forum

Community forum

Brad C.
2019-01-02T13:10:47Z
Would love a feature that allows a Job to count the number of lines in a specific file.

We use a Powershell script today to accommodate the same need, but it would be nicer for my Operational team to have a built in task for performing such tasks. We like to use our VisualCron jobs as an end-to-end implementation of an entire job, as opposed to just scheduling and running a simple load, for example. This way, our "Definition of Done" or "Definition of Success" can be better documented.

Traditional way:
- Run SSIS job to load file from Internet into database

Our Holistic, End-to-End approach
- Run a SQL command to get a count of the table that will be loaded
- Count the number of lines in the downloaded file
- Run SSIS job to load file into database
- Run a SQL command to get a count of the table that was loaded
- Error if the number of records is the same before and after the file was loaded (shouldn't ever happen in this case)
- Error if the number of records loaded in the database differs from the number of lines in the downloaded file

Either way, we can continue using our Powershell script if need be.. But it would be an easy task that would be a great addition for us.

Thanks for considering! We love VisualCron!
-Brad
shiv.malhotra
2019-01-02T17:18:09Z
Funny how I have the same exact issue. I would like to know the best approach for VisualCron to look at a file and IF number of rows = 1 (which will mean the file only has header row and is basically, empty) ; then DON'T run the task, skip to the next one.
Brad C.
2019-01-02T18:40:29Z
Right now, I use a Powershell task that uses a simple/easy script I called Measure-Lines.ps1.


Param (
    [string] $Path, 
    [switch] $HasHeader
)

$totalLines = (Get-Content $Path | Measure-Object -Line).Lines

if ($HasHeader) {
    $totalLines = $totalLines - 1
}

$totalLines


Use the On Error tab on the task to raise an error if the output doesn't meet your criteria..

Gary_W
2019-01-02T19:32:33Z
In the meantime if your files are not too large try a read file task, followed by: {STRING(RowCount|{TASK(PrevTask|StdOut)})}

Edit! Actually forget my answer above, or using get-content which reads the file into memory before operating op it. I need this functionality also, and do some stream editing using the stream reader and writer and tweaked it into a count lines function. It's Fast!

On my 83MB test file, it only took 4.1 seconds to count the lines. The other methods took more than 30 seconds!

Please give this powershell code a try:

#
# GetLineCount(Source, headers, trailers) - Reads a line from a file using streams for memory effiency on large files.
#
Param(
[Parameter(Mandatory=$true)]
[string] $source, # Source file path

[Parameter(Mandatory=$true)]
[ValidateRange(0, [int]::MaxValue)]
[int] $headers, # Header Rows to skip

[Parameter(Mandatory=$true)]
[ValidateRange(0, [int]::MaxValue)]
[int] $trailers # Trailer Rows to skip
)

##
## Read the source file
##
$file = New-Object System.IO.StreamReader($source, [Text.Encoding]::default, $true, 1MB)
while (($line = $file.Readline()) -ne $null) {
$total_lines = $total_lines + 1
}
if ($headers + $trailers) {
$total_lines = $total_lines - ($headers + $trailers)
}
$total_lines
$file.Close()

Note I am no expert and basically know enough powershell to be dangerous so if you see a glaring mistake or room for improvement please let me and future folks that find this know. Thanks!
Brad C.
2019-01-03T14:36:44Z
The StreamReader is a nice approach Gary!
Gary_W
2019-01-03T15:18:26Z
Thanks for the idea!
I used it in a different task as it does not read the file into memory. I was worried about memory usage when operating on large files. I read a file, applied a change to certain lines, then wrote it to a new file using stream writer. It's super fast too. I was using this to count lines:
Param(
[string] $filename
)
@(Get-Content $filename).Length

But since learned about streamreader being more efficient for large files and applied it to this problem. I will be going back and updating for sure.
bweston
2019-01-03T20:38:51Z
For what it's worth, while this is not memory-efficient like the stream reader as far as I know, it may be useful to Shiv in particular, if the file in question isn't usually very big even when it is non-empty. But it does avoid needing a task, if all you really want is a condition or a flow control.

{STRING(RowCount|{FILE(Content|path\to\file)})}
Scroll to Top