Brad Corbin

January 27, 2010

Identifying the High CPU SQL Instance

Filed under: SQL Troubleshooting,SQLServerPedia Syndication — Brad Corbin @ 5:34 pm

Before you can start your normal SQL performance tuning with profiler, DMVs, or whatever your normal routine is, you’ll need to be sure which SQL instance on that box is the cause of the trouble.

With just a couple of SQL instances on a box (maybe two different versions of  SQL), it isn’t too hard to figure out where the trouble lies. But with the increased use of virtualization and consolidation environments like Polyserve, servers with 5 or more  distinct SQL instances aren’t that far-fetched.

This makes troubleshooting a “High CPU on HOST123″ alert a little difficult. And if someone is complaining that a certain application is slow, how do you know if that application is really at fault, or if that slowness is simply a symptom of very high CPU on an entirely different instance?

The key involves knowing the ProcessID for each running instance of SQL. Here are a few ways to find out the ProcessID:

1. Look in your SQL server log, back to when the server was last restarted. It should contain a row that looks something like: “Server Process ID is 6404.”

2. Run a query on each server:

SELECT serverproperty('ServerName') AS Instance, serverproperty('ProcessID') AS PID

3. Use a cool PowerShell script that Allen White describes in this blog post

WARNING: The ProcessID changes every time the SQL service is restarted, so don’t expect these values to stick around.

Select Columns dialog in Task Manager

Once you know what they are, what do you do with these process IDs?

THE FAST WAY

If you need to see what’s using up the most CPU on the server right now, just use the Processes tab in Task Manager.

The trick is to go into View, Select Columns, and check the “PID (Process Identifier)” column. I also check the “CPU Time” column so I can see how much cumulative time each instance has used (although if the instances haven’t been all running the same length of time, this may not be a good comparison).

Now sort the process list by name, and you should be able to tell which SQL server instance is using the most CPU currently:

Task Manager with PID column

THE HARD WAY

Perfmon Process counters

If you need to see CPU usage over a period of time, or in a specific time window, you will have to use Performance Monitor. (See Brent’s excellent Perfmon Tutorial and video here to get you started).

In this case, the trick is to capture the “Process: % Processor Time” counters for each of the “sqlservr” processes. Also make sure you capture the “Process:ID Process” counter, so you can match the ProcessID in the log to the ProcessID of each SQL instance as we’ve described above.

NOTE: Some have reported that the sqlservr processes are always listed in ascending order by ProcessID. I have found this not to be true. Not sure why it worked for him but not for me, but if you include the “ID Process” counter, you won’t have to guess.

In actuality, I save my Perfmon counter logs to CSV files so I can easily open them up in Excel. For the sake of this article, though, I’ll show a sample of the perfmon graph view:

Processor time by instance

So during at least the first part of this monitoring period, sqlservr#2  is the instance using the most CPU.

Normally all you need to determine is which one is highest, but if you are a stickler for accurate numbers, and you noticed that the individual Process:% Processor Time numbers add up to more than the total Processor:% Processor Time, that’s because the Process values are actually out of 100*(#of CPUs). In my case, this means that the max is 400%. (Or, if you want to be really pedantic, its actually a percentage of the “Processor:% User Time” counter value. See this thread for additional formulas and maths and stuff.)

September 30, 2009

Recovering a deleted SQL Job

Filed under: SQLServerPedia Syndication,Scripts — Brad Corbin @ 2:55 pm

A user wanted to recover a SQL job that had been deleted from the agent. We do backups of MSDB, of course, but how would you actually recover the job definition?

Thanks to ServerFault and user squillman, we used the following technique. Restore a copy of MSDB (we used MSDB_old, in this case). Then run the following script:

DECLARE @JobID UNIQUEIDENTIFIER
SELECT @JobID = job_id FROM msdb_old.dbo.sysjobs WHERE NAME='My Lost Job'

INSERT msdb.dbo.sysjobs
SELECT * FROM msdb_old.dbo.sysjobs
WHERE job_id=@JobID

INSERT msdb.dbo.sysjobsteps
SELECT * FROM msdb_old.dbo.sysjobsteps
WHERE job_id=@JobID

SET IDENTITY_INSERT msdb.dbo.sysjobhistory ON
INSERT msdb.dbo.sysjobhistory
    (instance_id,job_id,step_id,step_name,sql_message_id,sql_severity,
     [message],run_status,run_date,run_time,run_duration,operator_id_emailed,
     operator_id_netsent,operator_id_paged,retries_attempted,[server])
SELECT
    instance_id,job_id,step_id,step_name,sql_message_id,sql_severity,
    [message],run_status,run_date,run_time,run_duration,operator_id_emailed,
    operator_id_netsent,operator_id_paged,retries_attempted,[server]
FROM msdb_old.dbo.sysjobhistory
WHERE job_id=@JobID
SET IDENTITY_INSERT msdb.dbo.sysjobhistory OFF

INSERT msdb.dbo.sysjobschedules
SELECT * FROM msdb_old.dbo.sysjobschedules
WHERE job_id=@JobID

Worked like a charm! Thanks, squillman!
Note that you’d have to alter this script some if you were trying to retrieve an old version of a still existing job (because your job_id would conflict).

Monitoring Log File Growth using PerfMon

Filed under: SQL Maintenance,SQLServerPedia Syndication — Brad Corbin @ 2:37 pm

There are lots of ways to monitor data file and log file growth, but my favorite method is to set up a perpetual Counter Log in Windows Perfmon. The SQLServer:Databases object contains the following counters (among others):

  • Data File(s) Size (KB)
  • Log File(s) Size (KB)
  • Log File(s) Used Size (KB)
  • Log Growths
  • Log Shrinks
  • Percent Log Used

No idea why there isn’t a “Data File(s) Used Size (KB)”. You’ll have to use other methods for that.

I prefer to log this to a CSV file, so I can just whip it open in Excel and make a few graphs (ok, nice graphs take a long time, actually…). This lets you see what databases are responsible for using different amounts of log space over time, and helps you determine an ideal “permanent” log size for each database, so you don’t have 100gb set aside for a log that never uses more than 1gb.

Graph Sample

Make sure to capture data more often than you run log backups (if you backup logs every hour, then capture this data every 15 minutes, for example). That way you can see how much log space is actually used between log backups, and possibly make some adjustments to your log backup schedule.

To make sure your perfmon counter restarts automatically when the server gets rebooted, set the START TIME to some time in the past, and set STOP AFTER to 9999 days.

Also, check out Paul Randal’s awesome video attached to his Technet article to see these counters in action!

Book Review: SQL Tuning by Dan Tow

Filed under: SQL Books,SQLServerPedia Syndication — Brad Corbin @ 1:41 pm

DanTowCover

About a year ago, as I was digging deeper into SQL tuning issues, I came across this StackOverflow question about SQL books and resources. One of the recommendations was for Dan Tow’s book, “SQL Tuning”. The Amazon reviews were fairly glowing, so I made the purchase, and have since had the opportunity to read and apply its ideas. Here are my thoughts.

Overview

SQL Tuning takes a very unique approach to the problem of optimizing SQL queries. Most of the time, books and articles on SQL server performance outline a method that looks something like:

  1. Identify slow queries
  2. Read and understand the SQL execution plan
  3. ??
  4. Profit!

Not really, of course, but its very easy to focus on the first two steps of this process, because it much easier to define concrete methods for them. Step 3 frequently ends up being mostly trial-and-error. Update stats. Play with indexes. Update stats WITH FULLSCAN. Change the logic in your WHERE clause. Hey, did you try updating stats yet??

Even if you do have an idea for an execution plan that might be better than the current one, how do you know if it is the best? How do you know you aren’t still leaving huge potential improvements on the table? How do you know when it’s good enough?

Dan Tow’s book teaches a concrete, repeatable methodology for determining the optimal execution plan for a query (or at least a very-near optimal solution), then shows you how to modify your query to use that new plan. The technique is really database agnostic, but he gives good specific methods for applying his ideas in MS SQL, Oracle, and DB2.

Technique

Without giving away too much, Dan’s technique involves creating a visual diagram of the query (based on joins) and with some fairly easy-to-follow rules, helps you determine:

  • Optimal table join order
  • Optimal indexes on each table

Its based primarily on the theory that you want SQL to retrieve the most selective tables first, so that it is handling the minimum amount of data possible during the entire process, as opposed to joining huge tables and only filtering them out at the end. This idea really is common sense, but for those who like to see the theory behind the technique, he gives a lot of good support to his arguments.

My Thoughts

The method made a lot of sense to me, but did it work? The first day I tried to apply his methods, I was able to take a very long running query (90 minutes or more), and get it down to under 2 minutes!! Can’t argue with that!

Even more than that, though, this book has helped me much better understand what is going on during the execution of a query, and how to bend it to my will (evil laugh). The right index for a table, for example, will depend on where in the query the table is used. Is it the first table retrieved? Is it retrieved in a join?

A few caveats:

  • This was published in 2004, so doesn’t contain any of the new features of SQL 2005 or 2008. SQL 2005 Plan Guides, for example, would probably be preferred to using query hints to actually control the plan behavior.
  • Continue monitoring your application. Hard-coding query hints could become a detriment instead of a help if your database contents change radically enough.
  • Some of the methods can get fairly complex (calculating the proper weight of subqueries, etc).

Overall, though, I thought it was an excellent read, and very worthwhile.

Resources

The Phantom Query

Filed under: SQL Queries,SQLServerPedia Syndication — Brad Corbin @ 7:40 am

PhantomSQL Server results should always be clearly-defined, right? Predictable, rational, repeatable…

Yeah, that’d be nice.

I was working with a developer investigating a query on a production server that he said wasn’t returning any results, when it ought to have returned several hundreds rows. Oddly, though, every time I ran it, it seemed to work fine. Same server, same query, same everything.

And then, suddenly, his query was returning correct results, and my query wasn’t!

No changes to the data, it was working one moment, then not working the next. Okaaayyy….

Just to try and quantify the weirdness, I sat and refreshed the query over and over. The query would return the correct result for exactly 30 seconds, then return no results at all for exactly 30 seconds. Whoa.

In most respects, it seemed like a fairly normal query, with about 5 tables in a join. The only part that seemed a little unusual to me was the following portion of the WHERE clause:

WHERE GETDATE() BETWEEN CATEGORY.FROM_DATE
AND ISNULL(CATEGORY.TO_DATE, GETDATE())

Looks a little unusual, but the logic seems sound. Its designed to return all rows that are currently active, ie where the FROM_DATE is in the past, and the TO_DATE is still in the future (or TO_DATE is NULL, indicating no expiration date has been set). In this case, all rows should have been retuned, because nothing had an expiration date yet. I probably wouldn’t have designed the query that way, but this logic seemed to work properly when I tested it against several sample tables.

The final clue was that this logic seemed to work fine on two of the tables in the join, but when it was used on a third table, we’d see the weird “on/off” behavior.

If you haven’t already guessed, the problem with that third table turned out to be that the TO_DATE field was a smalldatetime instead of a normal datetime. This meant that the ISNULL did an explicit conversion of GETDATE() to smalldatetime, which only has accuracy to the nearest minute.

And when you do a Convert(smalldatetime, GetDate()), half the time it will round down (first 30 seconds), and half the time it will round up (last 30 seconds of each minute). Here’s a query you can run that demonstrates this:

SELECT getdate() as Normal,
       convert(smalldatetime,getdate()) as Small,
       Case WHEN (getdate() <= convert(smalldatetime,getdate())) THEN 'True'
            ELSE 'False'
       END as DateIsBetween

So how to fix this? Well, we could of course simply change the datatype of the TO_DATE column in the third table, but I recommended to the developer that he alter the logic to something like:

WHERE (TO_DATE IS NULL AND GETDATE() >= FROM_DATE)
   OR (TO_DATE IS NOT NULL AND GETDATE() BETWEEN FROM_DATE AND TO_DATE)

which avoids the ISNULL function altogether.

Photo courtesy of “picadillywilson” and Creative Commons. See his flickr photostream here.

September 29, 2009

New SQL Server Blog!

Filed under: SQLServerPedia Syndication — Brad Corbin @ 10:06 pm

Hello, SQL World!

I had intended to dive fully into this SQL blogging thing at the beginning of September, so I would have tons of contest-eligible entries at http://sqlserverpedia.com. Unfortunately, life got in the way, and its been a bit of a crazy month. Hopefully a few relevant blog entries will get me at least a chance in the PASS drawing coming up extremely shortly…

I owe a huge debt of gratitude to SQLServerPedia, Quest Software, and especially Brent Ozar for hosting, setting up, and configuring this WordPress Blog. Thanks, Brent!

A little about me: I’ve worked with SQL server for roughly 6 years or so. I’m currently employed as a full-time DBA for a medium-to-large company in the St. Louis, MO area.  I’m privileged to be part of a larger team of talented DBAs, maintaining several thousand databases on several hundred separate SQL servers. We support a mix of SQL 2000, SQL 2005, and SQL 2008, but it seems to be the 2000 servers that require the most attention, so you’ll probably see my postings reflect that.

You can find me as BradDBA on Twitter BradC on StackOverflow.com, and  BradC on ServerFault.com

Starry Eye Avatar

My avatar is a combination of M.C. Escher’s Eye and M45, The Pleiades cluster (slightly less creepy than the original skull).

The avatar originated with my involvement with CloudyNights, a telescope/stargazing/astronomy discussion board, and the Offical Discussion Board for the International Year of Astronomy!

Powered by WordPress