App下載

使用長(zhǎng)時(shí)間運(yùn)行作業(yè)的警報(bào)監(jiān)控 SQL 代理

養(yǎng)了一個(gè)閑月亮 2021-09-09 10:26:49 瀏覽數(shù) (2314)
反饋

本篇文章將為大家詳細(xì)介紹一種通過(guò)電子郵件技術(shù)發(fā)出的警報(bào),可以應(yīng)用在長(zhǎng)時(shí)間運(yùn)行的 sql 代理作業(yè)時(shí)收到通知。下面是詳細(xì)請(qǐng)內(nèi)容,希望能夠?qū)Υ蠹矣兴鶐椭?/p>

問(wèn)題

當(dāng)你必須在繁忙的生產(chǎn)服務(wù)器上管理數(shù)百個(gè)計(jì)劃作業(yè)時(shí),不可避免地會(huì)出現(xiàn)作業(yè)需要很長(zhǎng)時(shí)間才能完成的情況,從而導(dǎo)致大量等待或影響其他進(jìn)程的性能。在深入調(diào)查性能下降的原因之前,我們想知道一項(xiàng)工作何時(shí)開(kāi)始花費(fèi)太長(zhǎng)時(shí)間。有些工作不僅需要很長(zhǎng)時(shí)間才能完成,而且它們可能需要比通常使用的時(shí)間更長(zhǎng)的時(shí)間才能完成。

哪些工作的表現(xiàn)正在倒退?緩慢的工作現(xiàn)在是否一直成為問(wèn)題,還是一次性發(fā)生?當(dāng)需要調(diào)查一項(xiàng)工作時(shí),通常在時(shí)間限制內(nèi),運(yùn)行時(shí)長(zhǎng)是我們分析中首先要考慮的指標(biāo)。

為了將當(dāng)前運(yùn)行時(shí)間與同一作業(yè)之前執(zhí)行的持續(xù)時(shí)間進(jìn)行比較,我們通常會(huì)查看作業(yè)的前一個(gè)運(yùn)行時(shí)間的歷史記錄,并看到當(dāng)前運(yùn)行時(shí)間比之前的幾個(gè)持續(xù)時(shí)間長(zhǎng)。然而,如果你以 DBA、開(kāi)發(fā)人員或 DevOps 工程師的身份管理 500 份工作,一次調(diào)查一項(xiàng)工作的任務(wù)可能需要半天時(shí)間,但仍然無(wú)法讓你更接近結(jié)論。

你可能需要檢查 20 個(gè)作業(yè),你不想一直單獨(dú)照看每份工作。在這種情況下,發(fā)送到你的郵箱的自動(dòng)警報(bào)會(huì)派上用場(chǎng)。本文中的解決方案是創(chuàng)建一個(gè)將接受參數(shù)的存儲(chǔ)過(guò)程。此參數(shù)是用于計(jì)算過(guò)去作業(yè)平均持續(xù)時(shí)間的天數(shù)。該過(guò)程會(huì)生成一份報(bào)告,顯示當(dāng)前正在運(yùn)行的作業(yè)的性能正在下降(退化)以及可選的電子郵件警報(bào)。

理解這種方法的某些步驟以及我們已經(jīng)在生產(chǎn)中使用的方便的存儲(chǔ)過(guò)程如下所示(在運(yùn)行受監(jiān)控作業(yè)的同一服務(wù)器上安排為單獨(dú)的作業(yè)(有關(guān)此類(lèi)作業(yè)的腳本,請(qǐng)參閱“附錄”) ) 以幫助您收到有關(guān)在輪詢時(shí)正在退化的一個(gè)或多個(gè)作業(yè)的電子郵件警報(bào)。我們?asp_long_running_Regressing_Jobs_Alerts?每分鐘在我們的環(huán)境中執(zhí)行 ?Stored Proc?, ,因?yàn)槲覀兊淖鳂I(yè)持續(xù)時(shí)間從幾秒到幾小時(shí)不等。在給予之前SP DDL,SP 的某些組件被分解為先決條件(小尺寸代碼片段),有助于全面了解警報(bào) SP。

所需工具列表:?SQL Server?(以下代碼在 2012 及更高版本上測(cè)試)

這里的目標(biāo)是將每個(gè)當(dāng)前正在運(yùn)行的作業(yè)的持續(xù)時(shí)間與給定時(shí)間段內(nèi)同一作業(yè)的所有運(yùn)行時(shí)間的平均持續(xù)時(shí)間進(jìn)行比較。

注意:在繼續(xù)編譯下面給出的存儲(chǔ)過(guò)程之前,請(qǐng)確保你在要分析計(jì)劃作業(yè)的環(huán)境中具有高訪問(wèn)權(quán)限。理想情況下,管理員。要檢查你的訪問(wèn)級(jí)別(如果你不是該服務(wù)器的管理員,請(qǐng)運(yùn)行此或類(lèi)似的 T-SQL 語(yǔ)句。如果你沒(méi)有此查詢的結(jié)果,則你沒(méi)有對(duì) MSDB 表的選擇權(quán)限。

USE MSDB
GO
SELECT HAS_PERMS_BY_NAME(QUOTENAME(SCHEMA_NAME(schema_id)) + '.' + 
       QUOTENAME(name), 'OBJECT', 'SELECT') AS have_select, *
FROM   sys.tables
GO

圖 1

圖 1:結(jié)果集顯示你擁有 SELECT 權(quán)限的表。

解決方案

下面存儲(chǔ)過(guò)程的 T-SQL DDL 代碼可以在您選擇的任何數(shù)據(jù)庫(kù)中編譯。我們使用專(zhuān)用?DBA_db?于此類(lèi)管理 SP。

USE [DBA_db]
GO
 
-- <begin stored procedure DDL/>
 
CREATE PROCEDURE asp_long_running_Regressing_Jobs_Alerts
         @history_days int = 7,
         @avg_duration_multiplier float = 1.5,
         @bEmail bit = 0,
         @bSaveToTable bit = 0,
         @RecipientsList Varchar(1000) = 'myName@myCoDomain.com',
         @ignore_zero_durations bit = 0
 
AS
/* example of usage:
   exec DBA_db..asp_long_running_Regressing_Jobs_Alerts
               @history_days = 45,
               @avg_duration_multiplier = 2,
               @bEmail = 0,
               @bSaveToTable = 0,
               @RecipientsList  = 'myName@myCoDomain.com;'   ,
               @ignore_zero_durations = 1
 
AUTHOR(s):
Vladimir Isaev;
-- + V.B., S.L;
-- contact@sqlexperts.org
*/      
 
/*PARAMETERS:
@history_days int          (how many days back we use for AVF run duration)
@avg_duration_multiplier   (how many times longer than AVG will qualify job 
                            for producing an alert)
@bEmail                    (send out Alert Email or just print the msg about Regressing jobs)
                           -- 'REGRESSION' is defined here by Duration only
*/
SET NOCOUNT ON
BEGIN
 
        select sj.name,
               sja.start_execution_date,
               sja.stop_execution_date,
               ajt.min_run_duration,
               ajt.max_run_duration,
               ajt.avg_run_duration,
               datediff(ss, start_execution_date, getdate()) as cur_run_duration
        into #Regressing_Jobs
 
        from msdb..sysjobactivity sja
               left join
               (select job_id,
                       avg(dbo.udf_convert_int_time2ss(run_duration)) as avg_run_duration,
                       min(dbo.udf_convert_int_time2ss(run_duration)) as min_run_duration,
                       max(dbo.udf_convert_int_time2ss(run_duration)) as max_run_duration
                       from msdb..sysjobhistory
                       where step_id=0
                       and run_date >CONVERT(varchar(8),GETDATE() - @history_days,112)
                       and ((run_duration <> 0 or @ignore_zero_durations = 0))
                       and run_duration < 240000
                       group by job_id
               )ajt on sja.job_id=ajt.job_id
        join msdb..sysjobs sj on sj.job_id=sja.job_id
        where
               sja.session_id = (SELECT TOP 1 session_id
                                   FROM msdb.dbo.syssessions
                                  ORDER BY agent_start_date DESC)
               AND start_execution_date is not null
               and stop_execution_date is null
               and datediff(ss, start_execution_date, getdate()) >
                   ajt.avg_run_duration * @avg_duration_multiplier
 
        select name as JobName,
               start_execution_date,
               stop_execution_date,
               dateadd(second, min_run_duration, 0) as min_run_duration,
               dateadd(second, max_run_duration, 0) as max_run_duration,
               dateadd(second, avg_run_duration, 0) as avg_run_duration,
               dateadd(second, cur_run_duration, 0) as cur_run_duration
        into #Regressing_Jobs_DurAsDate
        from #Regressing_Jobs
                                            --  waitfor delay '00:00:10'
        declare @sHtml varchar(max) = ''
 
        declare @tableHTML  nvarchar(max) =
               N'<H1>Job(s) taking longer than recent baseline duration
                (in descending avg duration order):</H1>' + Char(13)
               + N'    <table border="1">'           + Char(13)
               + N'    <tr bgcolor="#ddd">'          + Char(13)
               + N'           <th>Start Time</th>'   + Char(13)
               + N'           <th>Job Name</th>'     + Char(13)
               + N'           <th>Host Name</th>'    + Char(13)
               + N'           <th>History Days</th>' + Char(13)
               + N'           <th>Avg Dur Mul</th>'  + Char(13)
               + N'           <th>Min Dur</th>'      + Char(13)
               + N'           <th>Max Dur</th>'      + Char(13)
               + N'           <th>Avg Dur</th>'      + Char(13)
               + N'           <th>Cur Dur</th>'      + Char(13)
               + N'    </tr>'                        + Char(13)
 
        select @tableHTML =  @tableHTML
               + FORMATMESSAGE(
                       '<tr><td>%s</td>'      _
                          + Char(13) --start_execution_date
                       + '<td>%s</td>'        + Char(13) --name
                       + '<td>%s</td>'        + Char(13) --@@SERVERNAME
                       + '<td style="text-align:center">%i</td>' _
                          + Char(13) --@history_days
                       + '<td style="text-align:center">%s</td>' 
                          + Char(13) --@avg_duration_multiplier
                       + '<td>%s</td>'        + Char(13) --Min Dur
                       + '<td>%s</td>'        + Char(13) --Max Dur
                       + '<td>%s</td>'        + Char(13) --Avg Dur
                       + '<td>%s</td>'        + Char(13),--Cur Dur
                               convert(varchar, start_execution_date, 120),
                               JobName,
                               @@SERVERNAME,
                               @history_days,
                               convert(varchar, @avg_duration_multiplier),
                               format(min_run_duration, N'HH\hmm\mss\s'),
                               format(max_run_duration, N'HH\hmm\mss\s'),
                               format(avg_run_duration, N'HH\hmm\mss\s'),
                               format(cur_run_duration, N'HH\hmm\mss\s')
                       )
          from #Regressing_Jobs_DurAsDate
          order by avg_run_duration desc, JobName
 
               select @tableHTML = @tableHTML + '</tr></table>' + Char(13)
 
        select @sHtml = @tableHTML
        --select @sHtml
 
        declare @DateStr varchar(30) = convert(varchar,getdate(),121)
        IF @bEmail = 1 and (select count(*) from #Regressing_Jobs) > 0
         begin
              
               declare @sSubject varchar(250)
                   = @@SERVERNAME + ' Job(s) taking longer than recent baseline duration: ' _
                     + @DateStr 
 
               EXEC msdb.dbo.sp_send_dbmail  @profile_name='SQL Server Monitoring Account',
                                              @recipients= @RecipientsList,
                                              @subject=@sSubject, 
                                               @body=@sHtml,
                                              @body_format = 'HTML'
 
               print 'email sent: ' + CHAR(13) + @sHtml
        end
 
        IF @bSaveToTable = 1
          begin
               insert into RegressingJobs
         (
 CaptureDateTime,
 JobName,
 start_execution_date,
 HostName,
 history_days,
 avg_duration_multiplier,
                           min_run_duration,
 max_run_duration,
 avg_run_duration,
 cur_run_duration
 )
select         @DateStr,
JobName,
start_execution_date,
@@SERVERNAME,
@history_days,   
@avg_duration_multiplier,
min_run_duration,
max_run_duration,
avg_run_duration,
cur_run_duration
               from #Regressing_Jobs_DurAsDate
          end
 
        begin
 
         SELECT 'JOBS THAT ARE TAKING LONGER THAN USUAL:  '
         select  @DateStr as CaptureDateTime, JobName, _
            start_execution_date, @@SERVERNAME as 'Server',
                 @history_days as '@history_days', _
                    @avg_duration_multiplier as '@avg_duration_multiplier',
                 min_run_duration, max_run_duration, _
                    avg_run_duration, cur_run_duration
         from    #Regressing_Jobs_DurAsDate
 
        end
 
--all currently running jobs:
       begin
               SELECT ' ALL JOBS THAT ARE CURRENTLY RUNNING:  '
               SELECT
                 -- '',  -- CAST (ja.job_id AS VARCHAR(max)),
                       j.name AS job_name,
                  cast ( ja.start_execution_date as varchar) start_execution_time,  
                  cast ( ja.stop_execution_date  as varchar) stop_execution_time,  
                  -- ISNULL(last_executed_step_id,0)+1 AS current_executed_step_id,
                       Js.step_name step_name
               FROM msdb.dbo.sysjobactivity ja
               LEFT JOIN msdb.dbo.sysjobhistory jh
                       ON ja.job_history_id = jh.instance_id
               JOIN msdb.dbo.sysjobs j
                 ON ja.job_id = j.job_id
               JOIN msdb.dbo.sysjobsteps js
                 ON ja.job_id = js.job_id
                AND ISNULL(ja.last_executed_step_id,0)+1 = js.step_id
               WHERE ja.session_id =
                    (SELECT TOP 1 session_id
               FROM msdb.dbo.syssessions
           ORDER BY agent_start_date DESC)
               AND start_execution_date is not null
               AND stop_execution_date  is null;
 
        end
END
 
GO
 
-- <end of stored procedure DDL/>

asp_long_running_Regressing_Jobs_Alerts 使用的 2 個(gè) UDF

-- dependencies of asp_long_running_Regressing_Jobs_Alerts:
 
-- udf_convert_int_time
CREATE   FUNCTION [dbo].[udf_convert_int_time] (@time_in INT)
RETURNS TIME
AS
 BEGIN
        DECLARE @time_out TIME
        DECLARE @time_in_str varchar(6)
        SELECT  @time_in_str = RIGHT('000000' + CAST(@time_in AS VARCHAR(6)), 6)
        SELECT  @time_out    = CAST(STUFF(STUFF(@time_in_str,3,0,':'),6,0,':') AS TIME)
  RETURN @time_out
 END
GO
 
-- udf_convert_int_time2ss
CREATE   FUNCTION [dbo].[udf_convert_int_time2ss] (@time_in INT)
RETURNS int
AS
 BEGIN
        DECLARE @time_out int
        select @time_out = datediff(ss, 0,  dbo.udf_convert_int_time(@time_in))
  RETURN @time_out
 END
GO

除了 SP 標(biāo)頭中列出的調(diào)用示例之外,以下是一個(gè)典型調(diào)用的示例:

exec dba_DB.dbo.asp_long_running_Regressing_Jobs_Alerts
       @history_days = 45,
       @avg_duration_multiplier = 2,
       @bEmail = 1,
       @bSaveToTable = 0,
       @RecipientsList  = 'myName@myCoDomain.com;
                           AssociateName@myCoDomain.com’
       @ignore_zero_durations = 1

SP 的此調(diào)用意味著以下內(nèi)容:

給我一份報(bào)告(或警報(bào)),說(shuō)明在 45 天內(nèi)完成相同作業(yè)的平均運(yùn)行時(shí)間所需時(shí)間的兩倍的所有作業(yè)。通過(guò)電子郵件將此類(lèi)報(bào)告發(fā)送給我 ( ?myName?) 和我的同事 ( ?myAssociateName?)。不要將此數(shù)據(jù)保存到基線表,并且不包括持續(xù)時(shí)間為零的作業(yè)。

https://www.sqlservercentral.com/wp-content/uploads/2021/08/img_6118df89ac710.png

圖 2:SSMS 中對(duì) SP 的另一個(gè)類(lèi)似調(diào)用的示例輸出:在這種情況下,顯示現(xiàn)在花費(fèi)的時(shí)間比過(guò)去 2 天花費(fèi) AVG 多 10% 的作業(yè)。

下圖顯示了由 SP 生成的電子郵件警報(bào)在 HTML 中的外觀示例。

此電子郵件通知僅在 時(shí)通過(guò)調(diào)用此 SP 生成?parameter @bEmail = 1?。

https://www.sqlservercentral.com/wp-content/uploads/2021/08/img_6118e0f061fcd.png

圖 3

電子郵件主題行如下:?<ServerName> ?作業(yè)花費(fèi)的時(shí)間比最近的基線持續(xù)時(shí)間長(zhǎng);?yyyy-mm-dd mm:ss?

將結(jié)果保存到表格以供將來(lái)的歷史分析

如果你決定將報(bào)告保存到表格,請(qǐng)執(zhí)行以下操作:除了通過(guò)電子郵件或在 SSMS 中直接運(yùn)行 SP 收到警報(bào)外,還需要一個(gè)表格。用?@bSaveToTable = 1?呼叫 SP 。(它是 SP 的依賴項(xiàng)之一,因此即使此時(shí)您沒(méi)有將結(jié)果放入表格,也最好創(chuàng)建它)。這是表 DDL:

CREATE TABLE [RegressingJobs](
                            [CaptureDateTime]        [datetime]      NULL,
                            [JobName]                [sysname]       NOT NULL,
                            [start_execution_date]   [datetime]      NULL,
                            [HostName]               [sysname]       NOT NULL,
                            [history_days]           [int]           NULL,
                            [avg_duration_multiplier]    [float]     NULL,
                            [min_run_duration]           [time](7)   NULL,
                            [max_run_duration]           [time](7)   NULL,
                            [avg_run_duration]           [time](7)   NULL,
                            [cur_run_duration]           [time](7)   NULL
 ) ON [PRIMARY]
   GO

此調(diào)用具有完整功能的 SP 的示例,包括保存到表格和電子郵件警報(bào) ( ?bSaveToTable= 1, bEmail=1?):

EXEC DBA_db.dbo.asp_long_running_Regressing_Jobs_Alerts
                             @history_days = 30,
                             @avg_duration_multiplier = 2,
                             @bEmail = 1,
                             @bSaveToTable = 1,
                             @RecipientsList  = 'myName@myCoDomain.com;',
                             @ignore_zero_durations = 1

如何出于各種目的調(diào)用此 SP 的其他示例

將當(dāng)前運(yùn)行的作業(yè)與其過(guò)去 30 天的歷史進(jìn)行比較,并報(bào)告當(dāng)前持續(xù)時(shí)間超過(guò) 30 天平均值 1.5 倍的每個(gè)作業(yè)。不要發(fā)送警報(bào)電子郵件,也不要將此信息保存到基線表:

EXECUTE DBA_db.dbo.asp_long_running_Regressing_Jobs_Alerts 30, 1.5, 0, 0

將當(dāng)前運(yùn)行的作業(yè)與其過(guò)去 60 天的歷史進(jìn)行比較,并報(bào)告當(dāng)前持續(xù)時(shí)間超過(guò) 60 天平均值 2 倍的每個(gè)作業(yè)。向默認(rèn)收件人(列表)發(fā)送電子郵件警報(bào),并且不要將此信息保存到基線表:

EXECUTE DBA_db.dbo.asp_long_running_Regressing_Jobs_Alerts 60, 1.5, 1, 0

注意:強(qiáng)烈建議對(duì)參數(shù)進(jìn)行賦值,明確命名每個(gè)參數(shù)。給出上述示例是為了簡(jiǎn)潔。

結(jié)論

本文描述了管理員在高度自動(dòng)化的工作負(fù)載環(huán)境中管理和分析多個(gè)作業(yè)的性能問(wèn)題時(shí)面臨的問(wèn)題。我在此處共享的存儲(chǔ)過(guò)程允許管理員在某些作業(yè)在給定時(shí)間段內(nèi)花費(fèi)的時(shí)間超過(guò)其過(guò)去平均持續(xù)時(shí)間時(shí)收到警報(bào)。

附錄

下面是用于創(chuàng)建名為 [ ?MyMonitoredServerName_Maintenance- Regressing Jobs?]的計(jì)劃作業(yè)的 DDL,以?asp_long_running_Regressing_Jobs_Alerts?每分鐘執(zhí)行一次并向?BigShotAdminBigShotAdmin@MyCoDomain.com?發(fā)送警報(bào)。

USE [msdb]
GO
 
/****** Object:  Job [MyMonitoredServerName_Maintenance - Regressing_Jobs] ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
IF NOT EXISTS (SELECT name
                 FROM msdb.dbo.syscategories
   WHERE name=N'[BigLoad]'
     AND category_class=1)
BEGIN
 EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[BigLoad]'
 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
 
END
 
DECLARE @jobId BINARY(16)
 EXEC   @ReturnCode =  msdb.dbo.sp_add_job
        @job_name=N'MyMonitoredServerName_Maintenance - Regressing_Jobs',
                                @enabled=1,
                                @notify_level_eventlog=0,
                                @notify_level_email=0,
                                @notify_level_netsend=0,
                                @notify_level_page=0,
                                @delete_level=0,
                                @description=N'Send email notifications to _
                                @RrecipientsList (last parameter in SP) on CURRENTLY RUNNING _
                                Agent Jobs that regress in performance by duration compared to _
                                baseline (baseline collected during the number of days before _
                                getdate() specified by the first parameter @history_days.',
                                @category_name=N'[BigLoad]',
                                @owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [asp_long_running_Regressing_Jobs_Alerts] ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, _
   @step_name=N'asp_long_running_Regressing_Jobs_Alerts',
                                @step_id=1,
                                @cmdexec_success_code=0,
                                @on_success_action=1,
                                @on_success_step_id=0,
                                @on_fail_action=2,
                                @on_fail_step_id=0,
                                @retry_attempts=0,
                                @retry_interval=0,
                                @os_run_priority=0, @subsystem=N'TSQL',
                                @command=N'exec _
                                           dbmaint..asp_long_running_Regressing_Jobs_Alerts
                                        @history_days = 45,
                                        @avg_duration_multiplier = 2,
                                        @bEmail = 1,
                                        @bSaveToTable = 1,
                                        @RecipientsList  = ''BigShotAdmin@MyCoDomain.com;'',
                                        @ignore_zero_durations = 1',
                                        @database_name=N'master',
                                        @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'every 1 min',
                                @enabled=1,
                                @freq_type=4,
                                @freq_interval=1,
                                @freq_subday_type=4,
                                @freq_subday_interval=1,
                                @freq_relative_interval=0,
                                @freq_recurrence_factor=0,
                                @active_start_date=20201222,
                                @active_end_date=99991231,
                                @active_start_time=60000,
                                @active_end_time=235959,
                                @schedule_uid=N'999ac144-4e13-4965-82f2-55555cc37a09'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

享受有關(guān)回歸/長(zhǎng)時(shí)間運(yùn)行的 SQL 代理作業(yè)的警報(bào)!


SQL

0 人點(diǎn)贊