This is the first of several posts on modernizing T-SQL code with new features and functionality available in SQL Server.
Last year, you finally retired the last of your SQL Server 2008R2 instances. Congratulations! But are you taking advantage of everything that your new instances have to offer? Unless you did a review of all of the T-SQL in your applications, I’m guessing not.
At one time or another, we all find ourselves having to do some string parsing, especially splitting strings on a delimiter. Nearly all of us have one (or two or a dozen) functions for doing this somewhere on every instance of SQL Server. But since SQL Server 2016, we’ve had an official way to do it – the STRING_SPLIT()
function.
What Used To Be
People have written string splitting functions in a variety of ways (here’s a few; here’s a few more) over the years. Some use XML functions, other use CLR, others are plain T-SQL. The trouble is…none of them are great. They all have their weaknesses, most notably they can be slow. And UDF performance impacts are hidden in actual execution plans – you need to get the estimated plan from Management Studio if you want to see the approximate costs resulting from calling your function (yes, even though Actual plans are Estimated plans).
Microsoft Does It Better
Of course we expect native functions to perform faster than a UDF. But we need to test it to be sure. I won’t attempt to repeat the testing Aaron Bertrand did in the post that I linked above; suffice to say, there’s huge potential for performance gains here by using STRING_SPLIT()
.
Caveats
There are a couple caveats to using STRING_SPLIT()
:
- Your database has to be using
COMPATIBILITY_LEVEL
130
or higher - Ordering of the results is not guaranteed to match the input
- You don’t get any indication of what position in the source string each value came from (see above)
If you’re still early in your modernization journey, you may not have the application validated against this compatibility level. And you may have occasions where you need to preserve the ordering, or get an ordinal position for each value (or both).
What Next?
This is a tricky one. If you have multiple string-splitting UDFs, you may be able to get a partial win here. For example, if you have individual functions which return:
- A one-column table, no consideration for ordering
- A one-column table, with ordering preserved
- A two-column table, with ordinal position and the corresponding values
You may be able to replace that first function with STRING_SPLIT()
and get away with it. If that works for you, you don’t have to update every query right away; replace the body of the current function with STRING_SPLIT()
(for an immediate improvement everywhere), make sure that any new or revised code uses STRING_SPLIT()
directly (code reviews FTW!), and then as time allows, take care of any code via the time you’ve allocated to paying down technical debt.
If you have requirements that push you into either of the other two functions, you’re stuck for now. Head over to the Microsoft Feedback site and vote on the two items (one, two) posted there asking for this feature.
What I don’t understand is why this function only takes 1 character as a separator. I often use multiple characters as a separator just to make sure that the chances of that combination to exists in any string value is minimal to nonexistent.
That sounds like good feedback to send to the team at Microsoft via the links at the end of the post.
Same here – glad I’m not the only multi-delimiter hopeful out there!
It’s easy enough to put a wrapper around this that gets multi-character delimiters and element sequence added in. Performance is still very good for this, but it does have some slight overhead compared to the raw function, of course:
CREATE OR ALTER FUNCTION dbo.StringSplitWrapper
(@String_in VARCHAR(MAX),
@Delimiter_in VARCHAR(50))
RETURNS TABLE
AS
RETURN
(SELECT
SS.value AS Sub,
ROW_NUMBER() OVER (ORDER BY CHARINDEX(SS.value, T.V)) AS Seq
FROM
(VALUES
(REPLACE(@String_in, @Delimiter_in, CHAR(1)))) AS T (V)
CROSS APPLY STRING_SPLIT(T.V, CHAR(1)) AS SS);
Of course, that could be simplified even further by eliminating the TVC, but I leave that in as a shell for even more complex multi-line functions.