Figure out the social media photo situation (see above, “Up my selfie game”)
If you’re attending Summit, let’s meet up! I’ll be on Twitter, Slack, and Instagram (@alevyinroc across the board) all week and roaming the convention center & various evening events so ping me there to find out where I am.
Because Summit starts on Election Day here in the USA, be sure to either get to your local polling place that morning or follow your state’s process to request & submit an absentee ballot. Every election is more important than the one before it (that’s the most political I’ll get on this blog, I swear).
Triggers can be really useful in your database but you have to be careful with them. We often use them to record history to a separate table (at least, until we can implement temporal tables) or keep track of when a record was last updated. It’s usually implemented via an AFTER UPDATE trigger.
Several times a day, I’m alerted by our monitoring suite that two processes were deadlocked. This is a (mostly) normal event, but one that we’d like to avoid if we can. More than a few times, I observed this pattern:
Insert into table
Update a field in the row that was just inserted.
In my case, the LastModified update is implemented as an AFTER INSERT, UPDATE trigger. Which means that as I’m inserting the record into the table I’m turning around and immediately updating it. We can see this in both the execution plan and the IO stats provided by SET STATISTICS IO ON.
Here’s the setup. Just a pair of basic tables, one looks like like the other except one uses a default constraint and the other an AFTER INSERT trigger. We’ll set up the tables and then insert one row into each.
Great, the data matches what we expect! But how much work did SQL Server do? Let’s check out the execution plan and IO stats. First for the trigger:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'triggertest'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server parse and compile time:
CPU time = 13 ms, elapsed time = 13 ms.
Table 'triggertest'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 7 ms.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 104 ms.
And now with the default constraint:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'defaulttest'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 6 ms.
We want to spend our SQL Server licensing dollars wisely, so why ask it to do unnecessary work? More CPU time, more IO, and our client is waiting longer for the process to complete (I don’t like making anyone wait).
There’s a second “gotcha” with the AFTER INSERT method that applies to only some use cases. Let’s say you’re loading some historical data into the table, and that data includes the LastModified date. Or maybe your application has a very specific date that should be inserted into the table for this field.
And here’s what the tables look like after the inserts.
That AFTER INSERT trigger just clobbered real data!
To alleviate this, I’m migrating these tables to use a default constraint of GETDATE() on the LastModified fields, and changing the trigger to AFTER UPDATE (in the cases where I do need to write to a history table upon insert, I’m removing the LastModified portion from the AFTER INSERT trigger).
The result is the same – LastModified is set to the insertion date when first creating the record, and updated when updating the record. But the inserts are less work for SQL Server, and I won’t clobber my data.
This week I had a user come to me asking about how fields were defined on a few tables he was using in writing some reports. Long story short, he’s been tasked with writing some new reports and updating existing ones, but he doesn’t have visibility to the database itself so he’s left with the “ok, let’s try this” approach and then reading error messages to debug when things go sideways. Very tedious.
I asked him for a list of the tables he was most interested in, and while he worked on that I set to work on (of course) a quick & dirty PowerShell script to collect the data he needed – field names, types, and whether they’re nullable.
With the help of dbatools and Doug Finke’s ImportExcel module, the resulting script to retrieve the requisite data and export it to an Excel workbook with one sheet per table is very short. Each table is on its own worksheet, with the header row bold and frozen so it stays in view while scrolling.
Line by line:
Fetch the list of tables from a file provided to me
Gather a collection of SMO objects for the tables from the server
Loop over each table and extracting the relevant data about each
Write the table’s information out to a worksheet named for the table in an Excel workbook
The only tricky part was fetching the field lengths for the nvarchar fields. I had to go spelunking into each field’s Properties property to get that. The result:
But I can tell this isn’t disco enough for you, so I’m gonna put it right here. – Justin Hammer, Iron Man 2
You can do this in a one-liner too. It’s not as pretty, but it’s quite doable and the resulting Excel file is identical.
End to end, I turned this around in about 20 minutes and the the requestor’s life got easier in a big hurry. Best of all, when he comes back with another batch of tables for his next set of reports, I can turn the request around in less than five minutes.
I’ve been a proponent of dbatools for close to a year now and even contributed to the project, but surprisingly haven’t been a heavy user of it. Mostly due to a lack of opportunity. I’m aware of many of the functions by virtue of working on the built-in documentation and following the project and presentations about it.
So when the need arose to move a development/test instance of SQL Server from a VM onto a physical server, I knew exactly what I wanted to do. I was warned that the contents of this instance had been moved once before and it resulted in over a week of work and a bunch of trouble. I can’t speculate on why this was as I wasn’t there to see it, but I wasn’t going to let that happen on my watch. So, with equal parts hubris and stubbornness (and a dash of naïveté), I dove in. We have the technology. We will migrate this thing.
The advertising for Start-DbaMigration makes it look so easy. Source, destination, your method of moving the data, and you’re done. Right? Well, sure – in a small, controlled sandbox. This one was neither. About 150 databases. Two dozen Agent jobs. User account cleanup. Different drive letters and sizes. And when it was all over, the server name, instance name, and IP of the new box had to match the old one so that we didn’t disrupt production or the developers.
Of course we’re going to rehearse this. But with the destination being a physical machine, I didn’t have the luxury of rolling back a snapshot each time, or restarting from a golden image. And I couldn’t do everything because it wasn’t an isolated environment – I couldn’t test all the Agent jobs (don’t want emails going out in error) and couldn’t reconfigure the IP or server name. Which meant that my script had to clean up any artifacts from previous runs before doing the migration. Each time.
I also wanted to bring the new instance up in a controlled fashion as opposed to just moving everything and letting it go, so that I could check things out before letting them break. I also had to work in checkpoints so the network/server admin could do his pieces. Which meant that after the migration, everything on the old server had to be stopped, and Agent jobs on the new one disabled (but with a record of what was enabled/disabled on the source, so I could replicate it).
I rehearsed what I could about a half-dozen times. Each time through took about 4 hours (having multiple tests helps build confidence in your elapsed time estimates), primarily because of the amount of data that had to be moved (about 700GB). Each time, I found another tweak needed. Maybe not entirely necessary, but I was out to prove something. I didn’t want this migration to be “good enough, a little rough around the edges” – this had to work right, right away.
This is truly standing on the shoulders of giants. Without the thousands of person-hours put in by Chrissy and the rest of the team, a short script like this to do a mountain of work simply is not possible. It’s not just having the huge amount of code to build on – it’s the suite of tests they run with every pull request that tells me that I can trust it’ll work right.
Looking back on it, there’s definitely a few things I’d change in this script, and more dbatools functions I could have used. But after successfully testing a couple times, I didn’t want to break what was working.
When the migration was complete, I did a brief checkout and then gave my server admin the green light. He flipped names & IPs around, and then I ran Repair-DbaServerName which I had just discovered a few days earlier. I was expecting to do it manually but I trust the dbatools crew and their test suite more than myself on this one as I’ve never done this before. When that was complete, I had a grand total of three issues (that I could find):
Database owners weren’t set appropriately. I was able to resolve this via Set-DbaDatabaseOwner easily enough.
Outgoing dbmail didn’t work. Turns out the SMTP relay on the new server wasn’t started. Easy fix.
I had a Linked Server on my production instance which was unable to communicate to the new test server. This took me the longest to figure out. We checked everything – SQL Server Configuration Manager, the network itself, and then finally my colleague suggested testing something outside SQL Server – mapping a drive from production to test. This last test succeeded, which pointed us at the SQL Server connection specifically. The root cause: I had two firewall rules on the new server that blocked connections from all but servers on the local subnet. The production server isn’t on the local subnet.
None of these are total showstoppers. I had workarounds (or quick solutions) for them and as this is a test instance we could live with minor inconvenience for a day or two. One or two final tests, and I was satisfied that everything was working properly so I went ahead and enabled my Agent jobs. Some of them still have incorrect owners but I can fix that later – they were wrong on the source instance too.
I consider this migration a huge success. We had 95% functionality by 9am. By 3pm, the last real problems were resolved (and only that late due to a series of meetings keeping me away from my desk). Most importantly, it was achieved with minimal downtime for the development and QA teams. I’m now one week post-migration and everything is still running smoothly on the new instance.
One of the (many) fun things to do at PASS Summit is to check out the ribbons people have attached to their badges. Some are witty or goofy, others informational, others technical, and still more that let you express how you identify with a community within the community.
To celebrate dbatools and the awesome team & community around it, two limited edition badges will be available from/distributed by me and a handful of other folks all week at Summit. Check ’em out:
Be on the lookout for these badges and talk to us about dbatools! What you like, what you’d like to see changed, new feature ideas, questions about how to use functions, anything at all. Even if you’ve never used dbatools, we love talking about it and showing people the awesome things they can do with it so please, introduce yourself!
This month’s T-SQL Tuesday is hosted by Raul Gonzalez and he’s asked everyone to share things we might be a bit embarrassed about:
For this month, I want you peers to write about those important lessons that you learned the hard way, for instance something you did and put your systems down or maybe something you didn’t do and took your systems down. It can be also a bad decision you or someone else took back in the day and you’re still paying for it…
In the stress/performance testing portion of an upgrade of a critical system, we were short on disk space. So, rather than having a separate set of VMs for the performance testing (as we needed to be able to get back to functional testing quickly), we decided to just take VM snapshots of all the servers. Testing was delayed a day or two – but we didn’t switch off the snapshots. Then we started testing and performance was terrific…for about five minutes. Then everything came to a screeching halt. Panicked, we thought we were going to need a pile of new hardware until the VMWare admin realized that our disks were getting hammered and we still had those active snapshots. Lesson learned: If you take VM-level snapshots of your database server and let them “soak” for an extended period, you’re gonna have a bad time. Unless you need to take a snapshot of the host OS or instance configuration itself, use a database snapshot instead of a VM-level snapshot.
A couple of times, I’ve had under-performing VMs running SQL Server. As I hadn’t been involved in the configuration, I thought everything had been provisioned properly. Turns out…not so much. Memory reservations, storage configuration, power profiles, all set up for suboptimal performance. Lesson learned: Ask your VMWare admin if they’ve perused the best practices guide and review things yourself before going down the rabbit hole of SQL Server configuration & query tuning. If the underlying systems aren’t configured well, you’ll spin your wheels for a long time.
In doing a configuration review of a rather large (production) instance, I noted that at least one configuration option was still set to the default value – Cost Threshold for Parallelism was stuck at 5. Running sp_BlitzCache, I found that I had quite a few simple queries going parallel and huge CXPACKET waits. CXPACKET isn’t bad per se, but if you’ve got a low-cost query that’s going parallel and waiting on threads where it could be running faster overall single-threaded (verified this was the case for several of the top offenders), increasing the cost threshold can help. I did some checking, verified that it was a configuration change I could make on the fly, and set the value to 50. And then everything. Slowed. Down. When I made this configuration change on the test instance, it wasn’t much a problem. But that was a much smaller instance, with much less traffic. What I failed to fully comprehend was the impact of this operation. I overlooked that changing this setting (and a number of others I wasn’t aware of) blows out the plan cache. In the case of this instance, about 26Gb of plan cache. Not only was performance impacted while the plan cache was re-filled, we took a hit while all the old plans were being evicted from cache. Lesson learned: Even if it seemed OK in test, those “low impact” changes can have a much larger impact on production unless you can make test mirror production in every way. So plan when you make these changes accordingly.
We learn the most from our mistakes. We can learn almost as much from the mistakes of others. Learn from mine.
A few weeks ago I was looking at a query and got tripped up by the network and my own forgetfulness. It was a pretty simple query with a simple-looking execution plan. It didn’t even do that much work. About 20K logical reads and returned 200K records. For a server as large as the one I was working with, this should have been nothing. Instead, was waiting three minutes to get my results. My first thought was “aha, this query must need tuning, maybe a new index.”
I started experimenting in a non-production copy of the database on the instance. Added a couple indexes, adjusted the query itself, even a few query/index hints just to see what worked. Each change definitely had an impact on the query. The execution plan shape changed a bit, and the costs shifted around between a few operators. But still I was waiting minutes for the complete results.
The Plot Thickens
I tried my query against the development server. Production has 24X as much RAM, 3X as many CPU cores, and faster disks when compared to the dev box, so I expected even worse performance. Nope. Near-instant results, even for the original query with no additional indexes, hints or any other trickery.
OK, something’s definitely up now. I hopped back to production and while running my query, I ran sp_BlitzFirst to peek at what was going on. Lo and behold, the top wait stat was ASYNC_NETWORK_IO. I’ve seen lots of this in the past, attributed to poorly-written applications that just couldn’t consume the data as fast as SQL Server could churn it out. But this is SQL Server Management Studio, it’s absorbing the data as fast as it can receive it.
And that’s when it hit me. SSMS was consuming the data as quickly as it received it. The development server is in the office. The production server is in a datacenter 1/3 of a continent away and the connection between that datacenter and the office sometimes gets bogged down. The problem wasn’t that SSMS couldn’t drink from the firehose – the firehose had a clamp around it!
To test this idea, I made one last change to the query – I selected the data into a temp table instead of sending the results to SSMS. Round-trip time: two seconds. Conclusion: the network itself was causing the slowdown and this query needs no tuning!
As it turns out, SSMS has a way to discard query results so that you don’t have to resort to dumping results to a temp table – Discard query results after execution. It’s similar to a /dev/null for result sets. If you only need an actual execution plan but no query results (like I did here), check it out.
For years I’ve looked at various queries online in sample code, diagnostic queries using DMVs, and the like and seen CROSS APPLY in the FROM clauses. But I’ve never really managed to comprehend what it was for or how it worked because I never saw a case where it was directly applied to something I was doing.
Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to JOIN two tables on fields that should have been INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).
I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.
Setting the Stage
Let’s set up three simple tables to keep track of airports and what state each airport is in. But our developer doesn’t totally get database design and in his state-to-airport mapping table, he allows for a comma-separated list of airports associated with each state. [code language=”sql”] CREATE TABLE #States ([Id] INT IDENTITY(1, 1), StateName NVARCHAR(30) NOT NULL ); CREATE TABLE #Airports ([Id] INT IDENTITY(1, 1), IATACode CHAR(3) NOT NULL ); CREATE TABLE #StateAirports (StateId INT PRIMARY KEY NOT NULL, Airports NVARCHAR(50) ) [/code]
This makes getting a list of airports and their associated state names tricky at best if we don’t know about CROSS APPLY. With CROSS APPLY, it’s pretty straightforward.
Here’s the finished query. [code language=”sql”] SELECT s.statename, a.iatacode FROM #StateAirports SA1 CROSS APPLY string_split(SA1.airports, ‘,’) AS SA2 JOIN #Airports A ON A.Id = SA2.value JOIN #states S ON S.Id = SA1.stateid [/code] string_split() is a Table Valued Function which we finally got in SQL Server 2016 after far too many years of having to write (or, let’s face it, copy from someone’s blog post) inefficient string splitting functions. Important note: even if your database engine is SQL Server 2016, the database you’re operating in must be at CompatibilityLevel 130
Breaking it down
If we take CROSS APPLY and break it down into its parts, it finally starts to make sense. * APPLY the string_split() function to the Airports field of the #StateAirports table * Append the each row of string_split()‘s output to the corresponding row on #StateAirports (similar to a CROSS JOIN but not exactly)
So now I have N rows for each StateId in #StateAirports, where N is the number of values in the comma-separated field. And JOINed to each row is one of the rows from the output of string_split(). [code language=”sql”] SELECT * FROM #StateAirports SA1 CROSS APPLY string_split(SA1.airports, ‘,’) AS SA2 [/code]
From there, the query is pretty normal otherwise, JOINing to the other two tables to translate the state & airport ID numbers to their text values.
Hopefully this helps others get a handle on CROSS APPLY and find useful places for it. This had been a head-scratched for me for years, but only because I didn’t have an example that clearly broke down how to use it and what was going on. In hindsight, I probably could have used it in some analysis I did at a previous job but instead resorted to parsing & processing comma-separated data in a PowerShell script.
In a previous post, I expressed some frustration over Invoke-SqlCmd not setting an Application Name for its ODBC connection, leaving us with the generic .NET SqlClient Library when looking at active sessions in sp_who2 and sp_whoisactive (and any other monitoring tool). Unfortunately, I can’t really do anything about Invoke-SqlCmd aside from posting a suggestion on Connect or the Client Tools Trello board, but Invoke-SqlCmd2 has the same issue and that’s on GitHub. So, here we go!
In its current form, if a SqlConnection object isn’t passed into Invoke-SqlCmd2, the cmdlet does the following:
I decided to change this around so that it no longer uses string formatting, but instead a SqlConnectionStringBuilder. I had a couple reasons for this:
It will eliminate redundant code. There are several common elements in each of the ConnectionStrings above. If more complex logic is needed, there are potentially more copies of this ConnectionString kicking around.
It’s prone to copy/paste and other editing errors. If there’s a change that affects both versions of the ConnectionString and the developer just copies the line from one branch of the if statement to the other, code will be lost or invalid values will be substituted because of positioning.
With this in mind, I factored the common elements out to build the base of the ConnectionString, then added the remaining elements conditionally based on the cmdlet inputs.
Because ApplicationName is an optional parameter, I had to account for cases where it’s not specified by the caller. What the code above is doing is looking at the entire call stack and going back up to the very top to get the name of the script file that was run and ultimately called Invoke-SqlCmd2. Finally, I extract the ConnectionString from the SqlConnectionStringBuilder and assign it to the SqlConnection‘s ConnectionString property.
Odds are, you’ve got more than one application or script accessing your database or SQL Server instance at any given time. You’re probably stacking them on a small number of servers in an attempt to cut down on licensing costs and resource usage. All those Powershell scripts running on the central job server are running under a single service account, and you’ve got a lazy vendor who set up both the website and back-end application server to run under the same account, maybe even on a single app/web server.
Then the call comes in. “The database is slow.” Before firing up your monitoring suite, you take a quick pass with sp_who2 or sp_whoisactive and you’re greeted with a dozen sessions, all sporting a program_name of .Net SqlClient Data Provider. Terrific! Which session is coming from which application and causing all the trouble? Not so easy to figure out.
Fortunately, the .NET SqlClient (and other ODBC drivers as well) has a built-in solution. Your application’s connection string has quite a few parameters available to provide configuration and information, and one that seems to get overlooked is Application Name. This one does exactly what it says on the tin – it lets you specify a name that will be displayed to anyone looking for it in SQL Server, including sp_whoisactive. Anyplace you have the ability to write a connection string, you can use this. It costs you nothing!
Invoke-Sqlcmd -ServerInstance .\sql2016 -Database wideworldimporters -Query "select c.CityName from application.cities c join application.stateprovinces s on c.StateProvinceID = s.stateprovinceid join application.countries ct on ct.CountryID = s.CountryID where ct.CountryName = 'United States' and s.StateProvinceName = 'New York'; waitfor delay '00:00:15';"
Here’s the difference between using SqlClient and Invoke-SQLCmd in Powershell when seen from the SQL Server side:
While the .NET Framework lets you specify this fairly easily, as we move up in abstraction layers these options become buried or entirely unavailable. This is the case with the Powershell Invoke-SqlCmd cmdlet. One of the many benefits of using Invoke-SqlCmd2 is that it does construct its own connection string (if you don’t pass in a SqlConnection object) but it doesn’t expose Application Name as an available parameter for it.
In an upcoming post, I’ll show how I added Application Name support to Invoke-SqlCmd2.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.