Remove carriage return from CrLf - VisualCron - Forum

Community forum

Jon Tofte-Hansen
2017-01-13T15:44:53Z
Hi
I have a file where lines end with CrLf. I need to remove the Cr and I want to use VC (because its a part of a job).

What I want:
1: A "Read file" task (reads the file to Standard Output)
2: A "Write file" task that writes the above output to a file replacing Cr with null.

I can not get 2 to work with {STRING(Replace|{TASK(PrevTask|StdOut)}|Cr|)} in the [Value] field ("|" is used as separator because the output contains ","). What am I doing wrong?


I can get the desired result with a "For each" loop on the "Write file" task having [Line break] = Cr in the loop and = NoLineBreak in the task (this is necessary because another value adds a new line at the end of the copy which is not acceptable). This method is not as smooth as the first suggestion and the iterations are very slow (it takes several minutes for ~800 rows!). I have tried to update a variable instead in the loop but this is also slow, so it is the iterations that take time.
Gary_W
2017-01-13T16:03:57Z
Are you able to call dos2unix/unix2dos (or a similar tool to convert text files) on the file instead of manually dealing with the end of line character issue? Just something to think about that might save you some hassle and should be fast too.
Jon Tofte-Hansen
2017-01-13T16:06:20Z
Thank you for the comment. Of course I can use external tools, but it should be possible in VC.
Gary_W
2017-01-13T16:47:34Z
Please make sure to read the edits before using the regex.

Ok, here's a regular expression that works. I made a read file and a write file that reads stdout of the previous task whose value is:
{REGEX(Replace|{TASK(PrevTask,StdOut)}|(.*)(\r)(.*)|$1$3)}
The regular expression makes 3 "remembered" groups of the line which are indicated by being surrounded by parentheses. The first is the part of the line before the carriage return, the second is the carriage return, the third the rest of the line after the carriage return. The entire line is then replaced with the first and third remembered groups, effectively deleting the second one, which contains only the carriage return. Note in the write file task I set the line break character to LF.

You don't want to replace the carriage return with a NULL, but delete it. An embedded NULL will cause unexpected results as who knows how some string functions will handle that. I suspect they would treat the NULL as the end of the string, causing it to not see the line feed maybe. Don't take the chance.

EDIT
This has the same effect and is somewhat simpler. It basically only keeps the part of the line before the carriage return. Make sure to select LF as the line break character.
{REGEX(Replace|{TASK(PrevTask,StdOut)}|(.*)(\r.*)|$1)}

IMPORTANT EDIT 1/17/2017
I just had a realization while trying to figure out what was going on in my post #6 below that is important to understand here for how my answer is applied.
I will chalk it up to being a new user and still learning but I want to make sure it is clear for future searchers that may see this thread.
I had originally assumed the stdout from the previous task was being looked at line-by-line here but my realization is when the write file task is reading the stdout from the read file task, it is looking at the stdout as one big multi-line string , not line-by-line. This affects the logic of the regular expression.

The regex applied still works here, but it is not applied on a line-by-line basis as I originally thought, but for each OCCURRENCE of the pattern within the entire output. It happens to still work for the pattern in this case, but the distinction is important to understand if you apply this technique to other problems. Also consider that effectively the entire file is operated on at once which may have memory usage implications, I don't know.
Gary
Jon Tofte-Hansen
2017-01-16T07:54:26Z
Hi Gary

Thanks a lot for that elegant solution that works! I can certainly use such magic elsewere. Thank you for using time on this.

On null: because I'm an Oracle guy: null != chr(0) and simply means no value. I will avoid using it as such outside Oracle context.

On [Line break]: if you choose "Lf" a line feed is added to the last line which is not acceptable in my situation. But your first solution works with "NoLineBreak".
Gary_W
2017-01-17T16:18:20Z
Glad it worked for you. I realized I could use this to speed up a task I am doing, adding a value to each line in a file. So, to my test as described above, which works for stripping out the carriage return on every line, I testing adding a value and it only works for all but the last line.

Here is my regex, and with no line break set and the text ",TEST" is added to all lines but the last one. Wondering if I found a bug :

{REGEX(Replace|{TASK(PrevTask,StdOut)}|(.*)(\r)(.*)|$1,TEST$3)}

EDIT - 1/17/2017
I realized that when writing a file where the value is the stdout from a read file task (see my post #4 above) that the value is taken as one big multi-line string. Thus any regular expressions or edits applied are NOT on a line-by-line basis but are applied to the chunk as a whole. This is a very important distinction to understand as it could affect how you construct the regular expression.
So, where I was trying to alter my regex in post #4 above to instead add a value to each line, the correct value would be:

{REGEX(Replace|{TASK(PrevTask,StdOut)}|\r|,TEST)},TEST

Replace each occurrence of a carriage return with the string ",TEST", then add another one to the end.

I hope this clarifies the confusion I may have caused.
thomas
2017-01-17T23:09:34Z
If you don't mind using a .NET task, it's a fairly clean way of doing it also. You can make it into a generic ConvertToUnix task and call it from anywhere.

Capture.PNG
Gary_W
2017-01-18T14:32:58Z
Thanks Thomas, I will try this out. I can see starting a "utility" job (library) with a collection of "helper" tasks. Perhaps we should start a topic and share tasks like your converter above!
Jon Tofte-Hansen
2017-01-18T14:41:53Z
Gary: thank you for your comments, I found this to be sufficient:
{REGEX(Replace|{TASK(PrevTask,StdOut)}|\r|)}

thomas: Thank you for your solution, I will be able to use that as well.
thomas
2017-01-18T14:54:23Z
Good idea. I have a few of those that are useful (for me at least)

I feel it should be possible with a simple Replace in VC, ie without Regex. I couldn't figure out how to do it though. Maybe Henrik/Support knows how to do it.
Support
2017-01-20T09:35:46Z
Originally Posted by: thomas 

Good idea. I have a few of those that are useful (for me at least)

I feel it should be possible with a simple Replace in VC, ie without Regex. I couldn't figure out how to do it though. Maybe Henrik/Support knows how to do it.



I do not see what is wrong with Jon's solution? Did you have any problems using it?
Henrik
Support
http://www.visualcron.com 
Please like  VisualCron on facebook!
Jon Tofte-Hansen
2017-01-20T10:10:05Z
Hi Henrik
The question is: why does " {STRING(Replace|{TASK(PrevTask|StdOut)}|\r|)}" not work when "{REGEX(Replace|{TASK(PrevTask,StdOut)}|\r|)}" works?
thomas
2017-01-20T10:16:59Z
I agree. Nothing wrong with Jon's solution. It is more than good enough. But normally you shouldn't have to use regex to do a simple replace. It's not very important though, more out of curiosity

Support
2017-01-26T22:25:56Z
Originally Posted by: thomas 

I agree. Nothing wrong with Jon's solution. It is more than good enough. But normally you shouldn't have to use regex to do a simple replace. It's not very important though, more out of curiosity



This is because \r\n is not interpreted as CRLF but as pure strings when doing this.
Henrik
Support
http://www.visualcron.com 
Please like  VisualCron on facebook!
Gary_W
2017-01-27T15:29:49Z
If there is not already a published document of the character set supported by the String-Replace function then it looks like we could use one. Even better to have info available at the point of use in the UI in the variables dialog when you select the function to use. Press a button to show a new window with the supported character set maybe. Do you think String-Replace and Regex-Replace should handle special characters the same at any rate? That was my assumption also. What if you really need to replace '\r\n' though? In other languages you would use '\\r' to escape the escape character. Just throwing it out there for consideration. Could get ugly.

While I'm requesting documents I would also like to see a spec of your regular expression implementation since there are some non-standard characters used like the '$' to reference a remembered group (typically a backslash is used) and the pipe cannot be used as a logical OR due to it being used as a separator, etc. I wonder if you have other characters to use for those instead.

Thank you for your consideration,
Gary
Support
2017-02-02T10:20:57Z
The string replace is limited to use of characters that needs "escaping". However you can use the Regex.Replace instead:

{REGEX(Replace|your_text_with_linebreaks|\n+|)}

Henrik
Support
http://www.visualcron.com 
Please like  VisualCron on facebook!
Scroll to Top