SQLChicken.com

SQL Server DBA Tips & Tricks

By

Identify and Alert for Long-Running Agent Jobs

Being a DBA is like being a train conductor. One of the biggest responsibilities is making sure all jobs are running as expected, or making sure “all the trains are running on time” so to speak. As my partner-in-crime Devin Knight (Blog | Twitter) posted earlier, we have come up with a solution to identify and alert for when SQL Agent jobs are running longer than expected.

The need for this solution came from the fact that despite my having alerts for failed agent jobs, we had a process pull a Palin and went rogue on us. The job was supposed to process a cube but since it never failed, we (admins) weren’t notified. The only way we got notified was when a user finally alerted us and said “the cube hasn’t been updated in a couple days, what’s up?”. Sad trombone.

As Devin mentioned in his post the code/solution below is very much a version 1 product so if you have any modifications/suggestions then have at it. We’ve documented in-line so you can figure out what the code is doing. Some caveats here:

  • This solution has been tested/validated on SQL Server 2005 (SP4) and 2008 R2 (SP1).
  • Code requires a table to be created in a database. I’ve setup a DBAdmin database on all servers here for custom scripts for DBAs such as this, Brent Ozar’s Blitz script, Ola Hallengren’s maintenance solution, Adam Machanic’s sp_whoisactive, etc. You can use any database you’d like to keep your scripts in but just be aware of the USE statement at top of this particular code
  • This solution requires that you have Database Mail setup/configured
  • To setup this solution, create an Agent job that runs ever few minutes (we’re using 5) to call this stored procedure
  • FYI, I set the mail profile name to be the same as the server name. One – makes it easy for me to standardize naming conventions across servers. Two – Lets me be lazy and code stuff like I did in the line setting the mail profile name. If your mail profile is set differently, make sure you correct it there.
  • Thresholds – This is documented in code but I’m calling it out anyways. We’ve set it up so that any job whose average runtime is less than 5 minutes, the threshold is average runtime + 10 minutes (e.g. Job runs average of 2 minutes would have an alert threshold of 12 minutes). Anything beyond a 5 minute average runtime is controlled by variable value, with default value of 150% of average runtime. For example, a job that averages 10 minute runtime would have an alert threshold of 15 minutes.
  • If a job triggers an alert, that information is inserted into a table. Subsequent runs of the stored procedure then check the table to see if the alert has already been reported. We did this to avoid having admins emailed every subsequent run of the stored procedure.

CODE (WARNING: This code is currently beta and subject to change as we improve it)

Last script update: 7/24/2012

Change log:

7/12/2012 – Updated code to deal with “phantom” jobs that weren’t really running. Improved logic to handle this. Beware, uses undocumented stored procedure xp_sqlagent_enum_jobs

7/24/2012 - Updated code to v 1.16 to deal with email alerts error. Removing code from blog post and asking folks to instead download directly from download link below. Formatted code on blog makes it messy and a pain when updating.

Download script link – Click here

Got any feedback/comments/criticisms? Let me hear them in the comments!

Share

9 Responses to Identify and Alert for Long-Running Agent Jobs

  1. Pingback: SQLChicken’s New Tool « SQL Swampland

  2. Dylan says:

    Hi Jorge,

    The code got chopped off at the insert into ##RunningJobs.  It appears to chop after the first WHEN clause of the CASE statement.

    • SQLChicken says:

      Thanks for heads up Dylan. Code actually got updated and I’ve included a download link for script as well just in case the copy/paste from post doesn’t work correctly

  3. I’ve been using this kind of intervention system ever since SQL7.0 :-) Nice alternative ! http://www.sqlservercentral.com/scripts/Maintenance+and+Management/30713/

    • SQLChicken says:

      Very cool, thanks for sharing! FYI I’ve updated code to deal with a few bugs we found here

  4. DBAdmin says:

    Sysjobactivity doesn’t have the latest information hence it’s failing to send the accurate alerts.

    Any other alternate solution?

  5. dv says:

    Hi,
    I have looked at using the script however in the code there is a droptable missing. the temp table created it not dropped hence when it runs again it errors on creating the temp table.

  6. Michael says:

    Thanks a lot for your posting. Great script and highly useful.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">