It’s early September, which means it’s time for T-SQL Tuesday! This month’s topic comes from Kevin Chant (blog | twitter). Our mission, should we choose to accept it (click the image to see the original invite):
Recently I’ve had to submit suggestions to Microsoft about Azure DevOps and SQL Server.
I will admit a couple of the suggestions had certainly been in my head for a while. In fact, I wish I had suggested them sooner.
My invitation to yourselves this month is to write a post about a fantasy SQL Server feature you’ve got in mind.
It can be absolutely anything at all related to SQL Server. For example, it could be about a new SQL Server operator to improve queries or a new service that does something amazing.
I think anyone who has worked with SQL Server for a while has at least one of these. I posted a while back about my desire for Script Update Mode to be run in parallel, to speed startup with new Service Packs and Cumulative Updates (my latest CU took 4 1/2 hours). But I’m not going to recycle that suggestion.
Divide and Conquer
So, here’s my fantasy feature: parallel query execution. But it’s not what you think. SQL Server has had the ability to create a query plan that uses multiple parallel threads for as long as I can remember. We can control how far parallel a query will go and how expensive a query has to be before it’s considered for parallelization with
max degree of parallelism and
cost threshold for parallelism. That’s all well and good.
No, what I want is the the ability to execute the same query against multiple databases on a single instance simultaneously and have the results merged into a single result set.
I’ve…Seen This Before?
You may be saying to yourself “self! I thought I could do this already!” And you’d be correct, from a certain point of view. With Registered Servers in SQL Server Management Studio, we can execute one query against multiple instances, and the results are compiled in to a single result set in SSMS for you to view.
Starting a multi-instance query
Multi-instance query results
We Aren’t All-In with the Cloud Yet
But this isn’t an option today for the on-premises, “boxed” SQL Server product. And probably not with Managed Instances either (I admit that I haven’t looked into beyond a cursory search). What I want is the ability to specify a collection of databases on my instance and the query to be executed. All through T-SQL.
Here’s what I propose it looking like (and if implemented, I’m sure it’ll look completely different).
A few ideas of possibilities this opens up:
- Query tables across many databases and compile the results into a single temp table or report
- Run the same data or schema change query against multiple databases
- Copy data from one database into multiple databases at once
- Query across multiple databases from the application tier (if you trust the everyone to properly secure and not abuse it, that is) in a single batch
Can’t I Do This Already?
Well…sort of. You can write a cursor to loop over a list of database names, and inside that cursor write dynamic SQL for your queries. But it won’t be parallel, and it’s easy to throw a syntax error or even SQL injection into the mix with dynamic SQL.
3rd-party tools like Red Gate’s Multi Script can run a query across multiple databases simultaneously. You could even pull together some PowerShell and run that in threads, either via jobs (
Start-ThreadJob) or rolling your own with Runspaces. But both of these solutions run independent batches and don’t give you a single unified result set without some extra work like writing to a table (temp or otherwise).
Since this proposal is “just” T-SQL, and with Registered Servers we can execute the same T-SQL against multiple instances concurrently, can we take this to the next level and run my proposed query construct against all the selected registered servers? I don’t see why not. The results would look something like this:
What Do You Think?
Let me know what you think in the comments below. Would this be a valuable feature for you? Do other multi-tenant, database-per-tenant environments face the same challenges that would be aided by this feature? Would there be value in allowing the user to set the number of concurrent queries this can execute (maybe up to
MAXDOP as configured on the server)?