Lesson Learned From Multi-Threading with dbatools
Over the summer, I spent some (a lot of) time working on updates to a script at work which runs multiple processes in parallel. Everything seemed to work OK for a while, but then everything broke. It broke right around the time dbatools 1.1 dropped, so I started thinking that something must have changed there. As it turns out, it was entirely my fault and I hope this post will help you avoid the same trap.
The Setup
My original function isn’t anything too wild. As input, it takes the name of a SQL Server instance along with a few other parameters. Then it calls Start-ThreadJob
(I used to use PoshRSJob
, but I moved to ThreadJob
last year) in a loop to spin up multiple concurrent threads, each running Copy-DbaDbTableData
.
And this worked pretty well.
Where I Broke It
While making my revisions to the function, I added one extra line to the begin {}
block, leaving everything else pretty much the same.
My reasoning was that maybe I could get things to run a little faster if I wasn’t constantly opening and closing connections to the server. Instead, it completely broke while throwing out the following errors:
WARNING: [10:43:50][Invoke-QueryRawDatabases] Failure | Error connecting to [MyServer]: There is already an open DataReader associated with this Connection which must be closed first.
WARNING: [10:43:50][Copy-DbaDbTableData] The object [Table3] matches 0 objects. Unable to determine which object to copy
The Problem
It took far longer than I care to admit before I realized what I was doing wrong. I was being so efficient, so stingy, with my database connections, that my threads were stomping all over one another! Every table copy was trying to go through the same connection, and that doesn’t fly because each connection can only have one open DataReader
at a time.
dbatools already leverages connection pooling (because SMO does - scroll down to “Part 2”), so this attempt at extra efficiency is not only pointless, it’s causing more problems.
Why did it take so long for me to work that out? I’ll be honest - I wasn’t reading the error messages very carefully and thinking about what they were telling me.
The Solution
The solution was pretty easy - I reverted that initial change where I established a connection object to pass around, and everything started working properly again. Testing resumed, completed successfully, and I was able to deploy my updates to production.