How to use regex to get text from filenames. - VisualCron - Forum

Community forum

Steve Williams
2021-10-21T12:33:22Z
I'm trying to capture some text from a list of file names. When I create a task "List Files" from the following folder
\\o.cartermi.com\inv\daily_print\stage\
including the file mask "*.pdf" I get the following result.

\\o.cartermi.com\inv\daily_print\stage\CNB-160-SB_R815588.pdf
\\o.cartermi.com\inv\daily_print\stage\CNB-22-S_R815594.pdf
\\o.cartermi.com\inv\daily_print\stage\CNB-24-SB_R815591.pdf
\\o.cartermi.com\inv\daily_print\stage\CNB-48-SB_R815585.pdf
\\o.cartermi.com\inv\daily_print\stage\CNB-80-SBEU_R815593.pdf
\\o.cartermi.com\inv\daily_print\stage\CNBH-128_R815589.pdf
\\o.cartermi.com\inv\daily_print\stage\CNBH-36-SB_R815576.pdf
\\o.cartermi.com\inv\daily_print\stage\CNBH-40-S_R815572.pdf
\\o.cartermi.com\inv\daily_print\stage\CRT-32-SB_R815586.pdf
\\o.cartermi.com\inv\daily_print\stage\CRT-40-SB_R815587.pdf
\\o.cartermi.com\inv\daily_print\stage\CSC-28-SB_R815592.pdf
\\o.cartermi.com\inv\daily_print\stage\CSC-56-SB_R815590.pdf
\\o.cartermi.com\inv\daily_print\stage\CYNB-160-S_R815573.pdf
\\o.cartermi.com\inv\daily_print\stage\SFH-56-A_R815569.pdf
\\o.cartermi.com\inv\daily_print\stage\SY-40-S_R815584.pdf
\\o.cartermi.com\inv\daily_print\stage\YNB-32-SC_R815577.pdf
\\o.cartermi.com\inv\daily_print\stage\YNB-36-S_R815583.pdf
\\o.cartermi.com\inv\daily_print\stage\YNB-80-S_R815579.pdf


I need to capture the "R" number text from each of the files found. See the following example below, we will use the 1st file in the list:
File name: CNB-160-SB_R815588.pdf
Text that needs to be captured: R815588
Steve Williams
2021-10-21T12:39:30Z
I did not realize but by hitting save it published the post. The rest of the question is that I tried the regex tool and used the following to parse the text I need:
(?:.){46,}(?:_)(\w)(\d){6,}(?:\.)(?:pdf)


But that did not return anything when I did a test show filtered files. The regex works on the regex test website, https://regexr.com/  ,

Gary_W
2021-10-21T14:37:29Z
Morning, your question should really be under "general problem solving". Also there's an edit button to make changes to your original post.
Anyway, your unexpected results are most likely due to the fact that you are applying the regex from a task after the list files task and are thus operating on the STDOUT of the previous task which is the entire list at once, as opposed to creating a loop and operating on the output of the files a line at a time. Depending on what you are ultimately trying to do will call for one of these methods and an appropriate regex tailored to that method as the source strings will be different and you need to allow for that.

i.e. When operating on the previous task's STDOUT of the entire list, output is really one large string with embedded \r\n's in it that you have to allow for in the regex.

Do you really want to operate on a line at a time, or do you want the output to be a list of strings from the last part of filename in the path that follow the last underscore in the filename, start with an 'R', then 6 digits and end before the string '.pdf' at the end of the line?

Note I described what you want to return in painful detail as before you can come up with a regex you should be able to describe what you expect to be returned in plain language. That way you can analyze for mistakes like "whoa, the number of digits could be 1 or more, not strictly 6", or "start with an 'R' but any number of any characters, not just digits". See what I mean?

Please provide some more info. 🙂
Gary_W
2021-10-21T15:36:55Z
After all that, maybe I lied or something changed since the version where I had an issue, or maybe I just learned. It can happen!
This seems to work to produce a list of what you want where the previous task's STDOUT is the list:

{REGEX(Replace|{TASK(PrevTask|StdOut)}|(.*_)(R.*)(\.pdf)|$2)}


This form of regex breaks the line into 3 remembered groups, surrounded by parenthesis and referenced left to right by a number preceded by a $.
First is the first part of the string up to and including the last underscore. 2nd group is the set of characters starting with an 'R' and ending before the 3rd group which is the file extension literal dot pdf. replace all that with remembered group 2.

Note you could tighten it up by enforcing 6 digits after the 'R' like this:

{REGEX(Replace|{TASK(PrevTask|StdOut)}|(.*_)(R\d{6})(\.pdf)|$2)}


Steve Williams
2021-10-21T18:41:20Z
The following is a screenshot of the current structure which generates my list and creates a text file of the list.
UserPostedImage 

This is the final variable code that works.
{REGEX(Replace|{LOOP(CurrentValueX)}|(.*_)(R\d{6})(\.pdf)|$2)}


I appreciate the help to get me this far. Now I need to generate a custom .exe file from visual studio in C# or Java script to handle the special printer language we will be printing the barcode labels to. That will be the easy part. Then we will just run the executable in a new task after the Write file task.
Scroll to Top