Community forum

Please note that VisualCron support is not actively monitoring this community forum. Please use our contact page for contacting the VisualCron support directly.


jsparks@tri-ad.com
2024-02-28T23:32:53Z
We experienced a server outage that impacted our instance of VisualCron. Upon rebooting the server and restarting the VC service, we had to evaluate for timed triggers that were missed during that outage window. What we find is that when the service is restarted, the next scheduled run date is as of the restoration date/time, not reflective of the next date/time as of the last run.

For example, if we have a daily trigger set for 5am and the last run was 2/1/2024 at 5am, then experience an outage that spans the time that next run would have occurred (2/2/2024 at 5am - say an outage from 3am to 6am), when VC is back online, the next trigger time shows as 2/3/2024 at 5am.

This in and of itself is not an issue, as it may not be desired to have the system automatically backdate and attempt to restart any missed jobs, but we are trying to find a better way to report out on that and determine what may need to be restarted. We used the Jobs.xml to try to parse out the values, but taking that "DateLastExecution" node value and attempting to recreate what the next run datetime from that point would be is pretty challenging given the many variations of timing in the TTime node. We were hoping there could be a secondary value added. For example, DateNextExecution under "Stats" node remains as the time that is truly scheduled - so 2/3/2024 5am in the example above, then "NextExecutionFromLastRun" would be what it determines would be the next run based on what it has for the last run time - so 2/2/2024 5am in the example above.

That value could then be compared to the DateNextExecution and any differences would represent one or more triggers missed in a window.
Sponsor
Forum information
bweston
2024-02-29T16:29:20Z
For what it's worth: The way I have dealt with this is

I have a job that runs every five minutes. Its second task is to take a settings backup. as {COMPUTER(Name)}.zip.

Its first task is to copy any backup file named PREV-{COMPUTER(Name)}.zip to the name DAY-{COMPUTER(Name)}.zip if its modified date is less than today's date, and then any backup named {COMPUTER(Name)}.zip to PREV-{COMPUTER(Name)}.zip.

This way I always have my last two five-minute backups and my last backup from the previous day.

I also have a job that runs on a trigger if it detects that either the system is shutting down or restarting, or Visualcron is stopping or being set to OFF, which first sets a user variable "LastStop" to the current time, and then copies the last {COMPUTER(Name)}.zip to a "LastStop" -stamped DOWN filename (except that step seems to run pretty rarely...but that's okay, due to details below).

And then I have a job that runs when the server starts, which is supposed to do the following, although it's broken at the moment due to some server changes and some parts of it may be redundant or overly complex:


  • Records the current time to another "LastStart" user variable
  • Sets the "LastStop" variable to the modified time of the PREV-{COMPUTER(Name)}.zip backup if it is newer than the current "LastStop" time, because that probably means the server crashed or otherwise wasn't able to set the "LastStop" variable when it should have
  • Copies the newest *{COMPUTER(NAME)}.zip backup that is at least nine minutes old to a "LastStop" -stamped DOWN filename if it is newer than any existing file of that name
  • Exports the current settings to a timestamped "UP" backup


...Note that at this point, I've got about as tight a comparison available to me of what things looked like immediately before and after the downtime. The rest, when it worked, was the cherry on top.

That last job is then supposed to go on from there to extract the settings from the "DOWN" settings copy and then use a Powershell script that I wrote to compare the XML against the jobs pulled from the server API, and build a "Catchup Job" full of "Execute Job" tasks, which are named and enabled or disabled based on when the "DOWN" copy says jobs were going to run next, when the API says they ran last, when the API says they'll run next, and a job variable that basically says "if it'll run again in less than this amount of time, it probably doesn't need to be caught up."

I also had a job that was meant to be run at the beginning of a planned outage whose purpose was to stop job processing, take a settings backup, and if the outage window was indicated to end after the next midnight, go ahead and create (and optionally run) a catchup job for anything that was supposed to run before then...because we had some late-night jobs that needed special handling if they didn't run on the correct day. That job also had a variable for "If the last time it ran was within this interval, either it was run manually in advance or it runs frequently enough that it shouldn't need to be included."

When these jobs were all last in operation, they meant I could configure the pieces that controlled a planned outage and almost sleep through it and let it handle itself (not quite; sometimes the catchup jobs needed to be curated or closely monitored), and an unplanned outage would automatically leave me the pieces to review to see if I needed to follow up on it, if not just recover on its own. I found all this vastly more useful than Visualcron's support for Time Exceptions or "Run missed Job once at start." (If I remember correctly, the original motivation to set all this up was because "Run missed job once at start" didn't apply in situations I wanted it to and because of those jobs I mentioned before that needed to be run ahead of the maintenance window if it was expected to cross into the next day.)

The reason they don't currently work has to do with domain changes, Windows server upgrades, and permissions; I just haven't gotten around to fixing it all to work in the new environment yet. But the parts involving what copies of the settings files are kept, while perhaps not ideal, might be a model you could use for what you're trying to accomplish. Specifically, if you have a settings backup preserved from within a few minutes before the outage, the DateNextExecution in the zipped XML will have the information you're looking for.
Scroll to Top