Modernizing Your T-SQL: Splitting Strings

2020-02-03

Page content

This is the first of several posts on modernizing T-SQL code with new features and functionality available in SQL Server.

Last year, you finally retired the last of your SQL Server 2008R2 instances. Congratulations! But are you taking advantage of everything that your new instances have to offer? Unless you did a review of all of the T-SQL in your applications, I’m guessing not.

At one time or another, we all find ourselves having to do some string parsing, especially splitting strings on a delimiter. Nearly all of us have one (or two or a dozen) functions for doing this somewhere on every instance of SQL Server. But since SQL Server 2016, we’ve had an official way to do it - the STRING_SPLIT() function.

What Used To Be

People have written string splitting functions in a variety of ways (here’s a few; here’s a few more) over the years. Some use XML functions, other use CLR, others are plain T-SQL. The trouble is…none of them are great. They all have their weaknesses, most notably they can be slow. And UDF performance impacts are hidden in actual execution plans - you need to get the estimated plan from Management Studio if you want to see the approximate costs resulting from calling your function (yes, even though Actual plans are Estimated plans).

Microsoft Does It Better

Of course we expect native functions to perform faster than a UDF. But we need to test it to be sure. I won’t attempt to repeat the testing Aaron Bertrand did in the post that I linked above; suffice to say, there’s huge potential for performance gains here by using STRING_SPLIT().

Caveats

There are a couple caveats to using STRING_SPLIT():

Your database has to be using COMPATIBILITY_LEVEL 130 or higher
Ordering of the results is not guaranteed to match the input
You don’t get any indication of what position in the source string each value came from (see above)

If you’re still early in your modernization journey, you may not have the application validated against this compatibility level. And you may have occasions where you need to preserve the ordering, or get an ordinal position for each value (or both).

What Next?

This is a tricky one. If you have multiple string-splitting UDFs, you may be able to get a partial win here. For example, if you have individual functions which return:

A one-column table, no consideration for ordering
A one-column table, with ordering preserved
A two-column table, with ordinal position and the corresponding values

You may be able to replace that first function with STRING_SPLIT() and get away with it. If that works for you, you don’t have to update every query right away; replace the body of the current function with STRING_SPLIT() (for an immediate improvement everywhere), make sure that any new or revised code uses STRING_SPLIT() directly (code reviews FTW!), and then as time allows, take care of any code via the time you’ve allocated to paying down technical debt.

If you have requirements that push you into either of the other two functions, you’re stuck for now. Head over to the Microsoft Feedback site and vote on the two items (one, two) posted there asking for this feature.