T-SQL Tuesday #136: Your Favorite (or Least Favorite) Data Type
T-SQL Tuesday is a monthly blog party hosted by a different community member each month, and this month Brent Ozar (blog | twitter) asks us to talk about data types.
Your mission: write a blog post about your favorite data type, and schedule it for next Tuesday, March 9.
Mad about money? Feverish about float? Tell us what you use your favorite data type for, and what people need to know before they get started using it. Data types have so many interesting edge cases, like when Aaron Bertrand taught us to use lower case for our data types.
In his original invitation title, Brent did open the possibility for us to write about our least-favorite type, so that’s the route I’m going.
Not Just One
My least favorite data type? It’s user-defined data types.
In this post I’m talking about scalar types, not CLR or table types. At first, these seem like a great idea.
- I can create my own data type names!
- I can standardize on the length and precision of various pieces of data in one place!
- No piece of software is gonna tell me what I can do, I’m telling it what to do!
But there’s a catch. There are a few catches.
Prix Fixe
Here’s the thing - these types are really just aliases for native types in SQL Server, but more constrained. Constrain yourself to UDTs and you’ll have trouble right-sizing your fields. Let’s say you’ve got three data types for text data:
- myShortString (
varchar(10)
) - myString (
varchar(256)
) - myBigString (
varchar(8000)
)
These lengths are not helping anyone. You can’t store email addresses or names in myShortString
. But myString
is probably way too much for that data. You’re going to waste memory because of how SQL Server estimates memory grants and your indexes will be bloated. Maybe you just need to create more UDTs to cover these situations. But that just compounds the other problems, doesn’t it?
What’s My Name Again?
It’s long been said that naming things is one of the two hard things in computer science. When you create a UDT, you have to name it. If you want a meaningful name, that’s really hard! How much data can I store in a column of type myString
? I have to look it up. What’s the difference between myShortString
and myString
? 🤷♂️
Set in Stone
Once you create a UDT, you can’t ALTER
it. Let’s say, for argument’s sake, you’ve created a UDT for your alphanumeric customer IDs. Business is booming, and a couple years down the road, you realize you need to add a few characters to the length of the field to keep up. You can’t just change the definition of your UDT to accommodate. Instead, you have to do one of two things:
- Create a new UDT, then run an
ALTER TABLE ALTER COLUMN
everywhere the old UDT was used to use the new one, thenDROP
the old UDT if it’s no longer needed ALTER
all of the columns using the UDT to be the standard type equivalent, then drop the UDT, re-create it with the same name, thenALTER
all the columns back to the UDT (which appears to be how SQL Server Data Tools/sqlpackage.exe
handles it)
Both of those are extra steps beyond using native types. And depending upon which avenue you choose, may lead to confusion.
Not the Whole Story
sp_help
will happily tell you the name, length, precision, and scale of the columns using UDTs on a given table. But that still doesn’t tell you enough about the base type. It won’t differentiate between varchar
and nvarchar
. Which means you need to work through sys.types
to get to the native type to find out what you’re really working with. Not exactly a chore but sp_help
is just so darn convenient.
Same, But Different
You can create multiple UDTs with the same name in the same database, but with different definitions. How? By defining them in different schemas. But a given UDT isn’t restricted to being used in tables that schema. sp_help
doesn’t give an indication of which schema the UDT on the table is coming from, so again you’re left scratching your head unless you query sys.types
(gosh, this post is starting to sound like an indictment of sp_help
, isn’t it?).
This is probably not a scenario that will come up often, but if it’s done it can cause a lot of confusion.
Crossing Databases
If you select * Into NewTable from TableWithUdt
, the new table will use your UDT because they’re both in the same database. But what if you need to do a quick copy of a table into another database? select * into OtherDB.dbo.NewTable from TableWithUdt
will create that new table using the UDT’s base type, whether a UDT of the same name/definition exists in the destination database or not. Whatever measure of clarity or control we thought we were getting from using a UDT has now disappeared.
Icing on the Cake
And then, on top of all of this - UDTs aren’t usable with temp tables at all.
I Feel Better Now
Whew. It feels great to get all that off my chest. I guess the main takeaway here is that I have some pretty strong reservations about using user-defined data types. Thanks to Brent for hosting T-SQL Tuesday this month and giving us a great topic to write about!