One of the (many) fun things to do at PASS Summit is to check out the ribbons people have attached to their badges. Some are witty or goofy, others informational, others technical, and still more that let you express how you identify with a community within the community.
To celebrate dbatools and the awesome team & community around it, two limited edition badges will be available from/distributed by me and a handful of other folks all week at Summit. Check ’em out:
Be on the lookout for these badges and talk to us about dbatools! What you like, what you’d like to see changed, new feature ideas, questions about how to use functions, anything at all. Even if you’ve never used dbatools, we love talking about it and showing people the awesome things they can do with it so please, introduce yourself!
This month’s T-SQL Tuesday is hosted by Raul Gonzalez and he’s asked everyone to share things we might be a bit embarrassed about:
For this month, I want you peers to write about those important lessons that you learned the hard way, for instance something you did and put your systems down or maybe something you didn’t do and took your systems down. It can be also a bad decision you or someone else took back in the day and you’re still paying for it…
In the stress/performance testing portion of an upgrade of a critical system, we were short on disk space. So, rather than having a separate set of VMs for the performance testing (as we needed to be able to get back to functional testing quickly), we decided to just take VM snapshots of all the servers. Testing was delayed a day or two – but we didn’t switch off the snapshots. Then we started testing and performance was terrific…for about five minutes. Then everything came to a screeching halt. Panicked, we thought we were going to need a pile of new hardware until the VMWare admin realized that our disks were getting hammered and we still had those active snapshots. Lesson learned: If you take VM-level snapshots of your database server and let them “soak” for an extended period, you’re gonna have a bad time. Unless you need to take a snapshot of the host OS or instance configuration itself, use a database snapshot instead of a VM-level snapshot.
A couple of times, I’ve had under-performing VMs running SQL Server. As I hadn’t been involved in the configuration, I thought everything had been provisioned properly. Turns out…not so much. Memory reservations, storage configuration, power profiles, all set up for suboptimal performance. Lesson learned: Ask your VMWare admin if they’ve perused the best practices guide and review things yourself before going down the rabbit hole of SQL Server configuration & query tuning. If the underlying systems aren’t configured well, you’ll spin your wheels for a long time.
In doing a configuration review of a rather large (production) instance, I noted that at least one configuration option was still set to the default value – Cost Threshold for Parallelism was stuck at 5. Running sp_BlitzCache, I found that I had quite a few simple queries going parallel and huge CXPACKET waits. CXPACKET isn’t bad per se, but if you’ve got a low-cost query that’s going parallel and waiting on threads where it could be running faster overall single-threaded (verified this was the case for several of the top offenders), increasing the cost threshold can help. I did some checking, verified that it was a configuration change I could make on the fly, and set the value to 50. And then everything. Slowed. Down. When I made this configuration change on the test instance, it wasn’t much a problem. But that was a much smaller instance, with much less traffic. What I failed to fully comprehend was the impact of this operation. I overlooked that changing this setting (and a number of others I wasn’t aware of) blows out the plan cache. In the case of this instance, about 26Gb of plan cache. Not only was performance impacted while the plan cache was re-filled, we took a hit while all the old plans were being evicted from cache. Lesson learned: Even if it seemed OK in test, those “low impact” changes can have a much larger impact on production unless you can make test mirror production in every way. So plan when you make these changes accordingly.
We learn the most from our mistakes. We can learn almost as much from the mistakes of others. Learn from mine.
A few weeks ago I was looking at a query and got tripped up by the network and my own forgetfulness. It was a pretty simple query with a simple-looking execution plan. It didn’t even do that much work. About 20K logical reads and returned 200K records. For a server as large as the one I was working with, this should have been nothing. Instead, was waiting three minutes to get my results. My first thought was “aha, this query must need tuning, maybe a new index.”
I started experimenting in a non-production copy of the database on the instance. Added a couple indexes, adjusted the query itself, even a few query/index hints just to see what worked. Each change definitely had an impact on the query. The execution plan shape changed a bit, and the costs shifted around between a few operators. But still I was waiting minutes for the complete results.
The Plot Thickens
I tried my query against the development server. Production has 24X as much RAM, 3X as many CPU cores, and faster disks when compared to the dev box, so I expected even worse performance. Nope. Near-instant results, even for the original query with no additional indexes, hints or any other trickery.
OK, something’s definitely up now. I hopped back to production and while running my query, I ran sp_BlitzFirst to peek at what was going on. Lo and behold, the top wait stat was ASYNC_NETWORK_IO. I’ve seen lots of this in the past, attributed to poorly-written applications that just couldn’t consume the data as fast as SQL Server could churn it out. But this is SQL Server Management Studio, it’s absorbing the data as fast as it can receive it.
And that’s when it hit me. SSMS was consuming the data as quickly as it received it. The development server is in the office. The production server is in a datacenter 1/3 of a continent away and the connection between that datacenter and the office sometimes gets bogged down. The problem wasn’t that SSMS couldn’t drink from the firehose – the firehose had a clamp around it!
To test this idea, I made one last change to the query – I selected the data into a temp table instead of sending the results to SSMS. Round-trip time: two seconds. Conclusion: the network itself was causing the slowdown and this query needs no tuning!
As it turns out, SSMS has a way to discard query results so that you don’t have to resort to dumping results to a temp table – Discard query results after execution. It’s similar to a /dev/null for result sets. If you only need an actual execution plan but no query results (like I did here), check it out.
For years I’ve looked at various queries online in sample code, diagnostic queries using DMVs, and the like and seen CROSS APPLY in the FROM clauses. But I’ve never really managed to comprehend what it was for or how it worked because I never saw a case where it was directly applied to something I was doing.
Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to JOIN two tables on fields that should have been INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).
I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.
Setting the Stage
Let’s set up three simple tables to keep track of airports and what state each airport is in. But our developer doesn’t totally get database design and in his state-to-airport mapping table, he allows for a comma-separated list of airports associated with each state. [code language=”sql”] CREATE TABLE #States ([Id] INT IDENTITY(1, 1), StateName NVARCHAR(30) NOT NULL ); CREATE TABLE #Airports ([Id] INT IDENTITY(1, 1), IATACode CHAR(3) NOT NULL ); CREATE TABLE #StateAirports (StateId INT PRIMARY KEY NOT NULL, Airports NVARCHAR(50) ) [/code]
This makes getting a list of airports and their associated state names tricky at best if we don’t know about CROSS APPLY. With CROSS APPLY, it’s pretty straightforward.
Here’s the finished query. [code language=”sql”] SELECT s.statename, a.iatacode FROM #StateAirports SA1 CROSS APPLY string_split(SA1.airports, ‘,’) AS SA2 JOIN #Airports A ON A.Id = SA2.value JOIN #states S ON S.Id = SA1.stateid [/code] string_split() is a Table Valued Function which we finally got in SQL Server 2016 after far too many years of having to write (or, let’s face it, copy from someone’s blog post) inefficient string splitting functions. Important note: even if your database engine is SQL Server 2016, the database you’re operating in must be at CompatibilityLevel 130
Breaking it down
If we take CROSS APPLY and break it down into its parts, it finally starts to make sense. * APPLY the string_split() function to the Airports field of the #StateAirports table * Append the each row of string_split()‘s output to the corresponding row on #StateAirports (similar to a CROSS JOIN but not exactly)
So now I have N rows for each StateId in #StateAirports, where N is the number of values in the comma-separated field. And JOINed to each row is one of the rows from the output of string_split(). [code language=”sql”] SELECT * FROM #StateAirports SA1 CROSS APPLY string_split(SA1.airports, ‘,’) AS SA2 [/code]
From there, the query is pretty normal otherwise, JOINing to the other two tables to translate the state & airport ID numbers to their text values.
Hopefully this helps others get a handle on CROSS APPLY and find useful places for it. This had been a head-scratched for me for years, but only because I didn’t have an example that clearly broke down how to use it and what was going on. In hindsight, I probably could have used it in some analysis I did at a previous job but instead resorted to parsing & processing comma-separated data in a PowerShell script.
In a previous post, I expressed some frustration over Invoke-SqlCmd not setting an Application Name for its ODBC connection, leaving us with the generic .NET SqlClient Library when looking at active sessions in sp_who2 and sp_whoisactive (and any other monitoring tool). Unfortunately, I can’t really do anything about Invoke-SqlCmd aside from posting a suggestion on Connect or the Client Tools Trello board, but Invoke-SqlCmd2 has the same issue and that’s on GitHub. So, here we go!
In its current form, if a SqlConnection object isn’t passed into Invoke-SqlCmd2, the cmdlet does the following:
I decided to change this around so that it no longer uses string formatting, but instead a SqlConnectionStringBuilder. I had a couple reasons for this:
It will eliminate redundant code. There are several common elements in each of the ConnectionStrings above. If more complex logic is needed, there are potentially more copies of this ConnectionString kicking around.
It’s prone to copy/paste and other editing errors. If there’s a change that affects both versions of the ConnectionString and the developer just copies the line from one branch of the if statement to the other, code will be lost or invalid values will be substituted because of positioning.
With this in mind, I factored the common elements out to build the base of the ConnectionString, then added the remaining elements conditionally based on the cmdlet inputs.
Because ApplicationName is an optional parameter, I had to account for cases where it’s not specified by the caller. What the code above is doing is looking at the entire call stack and going back up to the very top to get the name of the script file that was run and ultimately called Invoke-SqlCmd2. Finally, I extract the ConnectionString from the SqlConnectionStringBuilder and assign it to the SqlConnection‘s ConnectionString property.
Odds are, you’ve got more than one application or script accessing your database or SQL Server instance at any given time. You’re probably stacking them on a small number of servers in an attempt to cut down on licensing costs and resource usage. All those Powershell scripts running on the central job server are running under a single service account, and you’ve got a lazy vendor who set up both the website and back-end application server to run under the same account, maybe even on a single app/web server.
Then the call comes in. “The database is slow.” Before firing up your monitoring suite, you take a quick pass with sp_who2 or sp_whoisactive and you’re greeted with a dozen sessions, all sporting a program_name of .Net SqlClient Data Provider. Terrific! Which session is coming from which application and causing all the trouble? Not so easy to figure out.
Fortunately, the .NET SqlClient (and other ODBC drivers as well) has a built-in solution. Your application’s connection string has quite a few parameters available to provide configuration and information, and one that seems to get overlooked is Application Name. This one does exactly what it says on the tin – it lets you specify a name that will be displayed to anyone looking for it in SQL Server, including sp_whoisactive. Anyplace you have the ability to write a connection string, you can use this. It costs you nothing!
$DBConnection.Open(); $QueryCmd = $DBConnection.CreateCommand(); $QueryCmd.CommandText = "select c.CityName from application.cities c join application.stateprovinces s on c.StateProvinceID = s.stateprovinceid join application.countries ct on ct.CountryID = s.CountryID where ct.CountryName = ‘United States’ and s.StateProvinceName = ‘New York’; waitfor delay ’00:00:15′;"; $SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter; $QueryCmd.Connection = $DBConnection; $SqlAdapter.SelectCommand = $QueryCmd; $DataSet = New-Object System.Data.DataSet; $SqlAdapter.Fill($DataSet); $DataSet.Tables;
import-module sqlserver; Invoke-Sqlcmd -ServerInstance .\sql2016 -Database wideworldimporters -Query "select c.CityName from application.cities c join application.stateprovinces s on c.StateProvinceID = s.stateprovinceid join application.countries ct on ct.CountryID = s.CountryID where ct.CountryName = ‘United States’ and s.StateProvinceName = ‘New York’; waitfor delay ’00:00:15′;" remove-module sqlserver;
Here’s the difference between using SqlClient and Invoke-SQLCmd in Powershell when seen from the SQL Server side:
While the .NET Framework lets you specify this fairly easily, as we move up in abstraction layers these options become buried or entirely unavailable. This is the case with the Powershell Invoke-SqlCmd cmdlet. One of the many benefits of using Invoke-SqlCmd2 is that it does construct its own connection string (if you don’t pass in a SqlConnection object) but it doesn’t expose Application Name as an available parameter for it.
In an upcoming post, I’ll show how I added Application Name support to Invoke-SqlCmd2.
Something in SQL Server that isn’t always obvious to beginners is that when you create a default value for a column on a table, SQL Server creates a constraint (much like a primary or foreign key). All constraints must have a name, and if one isn’t specified SQL Server will generate one for you. For example:
[code language=”sql”]CREATE TABLE [dbo].[point_types] ( [typeid] [int] NOT NULL DEFAULT(NEXT VALUE FOR [pointtypeid]) ,[typename] [nvarchar](30) NOT NULL DEFAULT ‘Unspecified’ ,CONSTRAINT [PK_PointType] PRIMARY KEY CLUSTERED ([typeid] ASC) ) GO[/code]
We’ve got a simple table here and both fields have a default value set (the primary key’s value is generated from a sequence object, pointtypeid). Time goes on, and a change in policy comes up which requires that I change the default value of typename to Unknown. To do this, I have to drop the constraint and re-create it. To find the name of the constraint, I can either ask sp_help, or run this query:
[code language=”sql”]SELECT all_columns.NAME ,default_constraints.NAME ,default_constraints.DEFINITION FROM sys.all_columns INNER JOIN sys.tables ON all_columns.object_id = tables.object_id INNER JOIN sys.schemas ON tables.schema_id = schemas.schema_id INNER JOIN sys.default_constraints ON all_columns.default_object_id = default_constraints.object_id WHERE schemas.NAME = ‘dbo’ AND tables.NAME = ‘point_types’;[/code]
I’ve got my constraint name now, so I can drop it & re-create it
DROP CONSTRAINT DF__point_typ__typen__21B6055D; GO
ALTER TABLE [dbo].[point_types] ADD DEFAULT(‘Unknown’) FOR [typename]; GO [/code]
And if I re-run the above query, I can see that the constraint’s name is different.
This means that everywhere I need to change this constraint (development, test and production), I’ll need to figure out the constraint name in that particular database and drop it before re-creating it. This makes a deployment script a bit messier, as it needs more code to find those constraint names
ALTER TABLE [dbo].[point_types] ADD DEFAULT(‘Unknown’) FOR [typename]; GO[/code]
But this doesn’t really solve my problem, it just works around it. It’s still messy and fragile. If I need to do other operations on the default constraint, I need to go through the same exercise to find its name.
Fortunately, SQL Server lets us name default constraints just like any other constraint, and by doing so we avoid this trouble. By setting my own name for the constraint, I know what it’ll be in every database, without having to query system tables. The name can be set in both the CREATE TABLE statement and an independent ALTER TABLE.
[code language=”sql”]CREATE TABLE [dbo].[point_types] ( [typeid] [int] NOT NULL DEFAULT(NEXT VALUE FOR [pointtypeid]) ,[typename] [nvarchar](30) NOT NULL CONSTRAINT [DF_PT_TypeName] DEFAULT ‘Unspecified’ ,CONSTRAINT [PK_PointType] PRIMARY KEY CLUSTERED ([typeid] ASC) ); GO
ALTER TABLE [dbo].[point_types]
DROP CONSTRAINT [DF_PT_TypeName]; GO
ALTER TABLE [dbo].[point_types] ADD CONSTRAINT [DF_PT_TypeName] DEFAULT(‘Unknown’) FOR [typename]; GO[/code]
I can also combine these in the next deployment that requires a change to the default constraint, dropping the system-generated name and establishing my own static name to make things simpler in the future.
Is explicitly naming default (or any other) constraints necessary? No, but doing so helps your database document itself, and it makes future deployment/promotion scripts simpler and less prone to breakage. SQL Server needs a name for the constraint regardless; it’s worth specifying it yourself.
If you need to move data from one table into a new table, or even tables in a database into another database, the Import/Export Wizard in SQL Server Management Studio looks pretty tempting. Set up a source & destination, click a few buttons, kick back with a cup of tea and watch the progress bars, right?
It turns out that the wizard just isn’t as smart as it may seem. If you’re not careful, you won’t get what you’re expecting. Let’s check it out.
We’ll start by creating a real simple table in a database, containing a primary key and a computed column.
[code language=”sql”] Create table sample.dbo.SourceTable ( RowId int identity(1,1) not null primary key, Num1 int not null, Num2 int not null, Total as (Num1+Num2) ); [/code]
Let’s populate it with a few rows of data, then update some of that data to make sure the computed column is working. Remember, this is just to demonstrate the idea.
[code language=”sql”] insert into sample.dbo.SourceTable (Num1, Num2) values (1,2); go 100 select top 5 * from sample.dbo.SourceTable order by RowId; update sample.dbo.SourceTable set Num1 = Num1 * RowId where RowId <= 3; select top 5 * from sample.dbo.SourceTable order by RowId; [/code]
Great! We’ve got data, the computed columns are working, let’s copy it over to a new table in another database. We’ll just going to click Next, Next, Next through the wizard this time around.
Success! Our table has been copied and the data’s all there.
[code language=”sql”] select top 5 * from Sample2.dbo.SourceTable order by RowId; [/code]
Let’s do some work on our new table and check out the results.
[code language=”sql”] select top 5 * from Sample2.dbo.SourceTable order by RowId; update Sample2.dbo.SourceTable set Num2 = Num2 * RowId where RowId < 3; select top 5 * from Sample2.dbo.SourceTable order by RowId; [/code]
Woah! That’s not right. That Total column is supposed to be Num1 + Num2, and last time I checked 2 + 4 was not 4. Let’s keep going and try adding a new record the same way it was done earlier. [code language=”sql”] insert into Sample2.dbo.SourceTable (Num1, Num2) values (100,200); [/code] Cannot insert the value NULL into column 'RowId', table 'Sample2.dbo.SourceTable'; column does not allow nulls. INSERT fails.
Huh. Now that’s really odd, isn’t it? RowId is supposed to be an identity – we shouldn’t have to populate it. What is going on here? Let’s script out the table.
[code language=”sql”] USE [Sample2] GO
/****** Object: Table [dbo].[SourceTable] Script Date: 2015-11-10 22:36:23 ******/ SET ANSI_NULLS ON GO
SET QUOTED_IDENTIFIER ON GO
CREATE TABLE [dbo].[SourceTable]( [RowId] [int] NOT NULL, [Num1] [int] NOT NULL, [Num2] [int] NOT NULL, [Total] [int] NULL ) ON [PRIMARY]
GO [/code] This is all kinds of wrong! What happened to the primary key? Or the computed column? Well, it turns out that the wizard isn’t that smart, and if you just take all the default values, you’re going to get burned. Let’s go back to the wizard and click that Edit Mappings button in the Select Source Tables and Views screen.
Well…that looks like what we got above. And it’s not what we wanted. If we click Edit SQL, this is confirmed – the table being created is not defined the same way the source table is being defined.
Fortunately, we can edit the SQL here and make it match the source table definition, then finish the wizard.
OK, data’s copied – what do we have? [code language=”sql”] select top 5 * from Sample3.dbo.SourceTable order by RowId; update Sample3.dbo.SourceTable set Num2 = Num2 * RowId where RowId < 3; select top 5 * from Sample3.dbo.SourceTable order by RowId; insert into Sample3.dbo.SourceTable (Num1, Num2) values (100,200); select * from sample3.dbo.SourceTable where rowid >= 100 order by RowId; [/code]
Everything’s there, and it’s working the way it’s supposed to. Lesson learned: don’t blindly trust the defaults, especially the ones in a wizard. Double-check everything, and then verify that your schema works the way you expect it to before doing any business with it.
Fixed-position data formats will seemingly be with us forever. Despite the relative ease of parsing CSV (or other delimited formats), or even XML, many data exchanges require a fixed-position input. Characters 1-10 are X, characters 11-15 are Y and if the source data is fewer than 5 characters, we have to left-pad with a filler character, etc. When you’re accustomed to working with data that says what it means and means what it says, having to add “extra fluff” like left-padding your integers with a half-dozen zeroes can be a hassle.
I received a draft of a stored procedure recently which had to do exactly this. The intent is for the procedure to output the data almost entirely formatted as required, one record per line in the output file, and dump the result set to a file on disk. As it was given to me, the procedure was peppered with CASE statements like this (only more complex) in the SELECT clause:
[code language=”sql”] — Method 1 select case len(cast(logid as varchar)) when 9 then ‘0’ + cast(logid as varchar) when 8 then ’00’ + cast(logid as varchar) when 7 then ‘000’ + cast(logid as varchar) when 6 then ‘0000’ + cast(logid as varchar) when 5 then ‘00000’ + cast(logid as varchar) when 4 then ‘000000’ + cast(logid as varchar) when 3 then ‘0000000’ + cast(logid as varchar) when 2 then ‘00000000’ + cast(logid as varchar) when 1 then ‘000000000’ + cast(logid as varchar) when 0 then ‘0000000000’ + cast(logid as varchar) end as logid ,logtext from cachedb.dbo.logs; [/code]
It’s perfectly valid, it works, and there’s nothing inherently wrong with it. But I find it a bit tough to read, and it could become trouble if the format changes later, as additional (or fewer) cases will have to be accounted for. Fortunately, the day I received this procedure was right around the day I learned about the REPLICATE() T-SQL function. Maybe we can make this simpler:
[code language=”sql”] select replicate(‘0’,10-len(cast(logid as varchar))) + cast(logid as varchar) as logid,logtext from cachedb.dbo.logs; [/code]
Not bad. But it leaves us with a magic number and similar to the previous example, if the file format changes we have to seek out these magic numbers and fix them. This is easily remedied by defining these field lengths at the beginning of the procedure, so that they’re all in one place if anything needs to change.
[code language=”sql”] — Method 2 declare @paddedlength int = 10; select replicate(‘0’,@paddedlength-len(cast(logid as varchar))) + cast(logid as varchar) as logid,logtext from cachedb.dbo.logs; [/code]
Yet another approach would be to pad out the value beyond what we need, then trim the resulting string back to the required length. Again, we have to be careful to not leave ourselves with magic numbers; the solution is the same as when using REPLICATE():
[code language=”sql”] — Method 3 select right(‘0000000000’ + cast(logid as varchar), 10) as logid,logtext from cachedb.dbo.logs; — Or, with more flexibility/fewer magic numbers — Method 4 declare @paddedlength int = 10; select right(replicate(‘0’,@paddedlength) + cast(logid as varchar), @paddedlength) as logid,logtext from cachedb.dbo.logs; [/code]
All four methods yield the same results, as far as the data itself is concerned. But what about performance? For a table with 523,732 records, execution times were:
2,000ms CPU time, 261,785ms elapsed
2,265ms CPU time, 294,399ms elapsed
2,000ms CPU time, 297,593ms elapsed
2,078ms CPU time, 302,045ms elapsed
Each method had an identical execution plan, so I’m probably going to opt for the code that’s more readable and maintainable – method 2 or 4.
As with any tuning, be sure to test with your own data & queries.
I’m working with a number of SQLite databases as extra data sources in addition to the SQL Server database I’m primarily using for a project. Brian Davis (b|t) wrote a blog post a few years ago that covers setting up the connection quite well. In my case, I’ve got nine SQLite databases to connect to, and that gets tedious. PowerShell to the rescue!
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.