My First Migration with dbatools

I’ve been a proponent of dbatools for close to a year now and even contributed to the project, but surprisingly haven’t been a heavy user of it. Mostly due to a lack of opportunity. I’m aware of many of the functions by virtue of working on the built-in documentation and following the project and presentations about it.

So when the need arose to move a development/test instance of SQL Server from a VM onto a physical server, I knew exactly what I wanted to do. I was warned that the contents of this instance had been moved once before and it resulted in over a week of work and a bunch of trouble. I can’t speculate on why this was as I wasn’t there to see it, but I wasn’t going to let that happen on my watch. So, with equal parts hubris and stubbornness (and a dash of naïveté), I dove in. We have the technology. We will migrate this thing.

The advertising for Start-DbaMigration makes it look so easy. Source, destination, your method of moving the data, and you’re done. Right? Well, sure - in a small, controlled sandbox. This one was neither. About 150 databases. Two dozen Agent jobs. User account cleanup. Different drive letters and sizes. And when it was all over, the server name, instance name, and IP of the new box had to match the old one so that we didn’t disrupt production or the developers.

Of course we’re going to rehearse this. But with the destination being a physical machine, I didn’t have the luxury of rolling back a snapshot each time, or restarting from a golden image. And I couldn’t do everything because it wasn’t an isolated environment - I couldn’t test all the Agent jobs (don’t want emails going out in error) and couldn’t reconfigure the IP or server name. Which meant that my script had to clean up any artifacts from previous runs before doing the migration. Each time.

I also wanted to bring the new instance up in a controlled fashion as opposed to just moving everything and letting it go, so that I could check things out before letting them break. I also had to work in checkpoints so the network/server admin could do his pieces. Which meant that after the migration, everything on the old server had to be stopped, and Agent jobs on the new one disabled (but with a record of what was enabled/disabled on the source, so I could replicate it).

I rehearsed what I could about a half-dozen times. Each time through took about 4 hours (having multiple tests helps build confidence in your elapsed time estimates), primarily because of the amount of data that had to be moved (about 700GB). Each time, I found another tweak needed. Maybe not entirely necessary, but I was out to prove something. I didn’t want this migration to be “good enough, a little rough around the edges” - this had to work right, right away.

This is truly standing on the shoulders of giants. Without the thousands of person-hours put in by Chrissy and the rest of the team, a short script like this to do a mountain of work simply is not possible. It’s not just having the huge amount of code to build on - it’s the suite of tests they run with every pull request that tells me that I can trust it’ll work right.

Looking back on it, there’s definitely a few things I’d change in this script, and more dbatools functions I could have used. But after successfully testing a couple times, I didn’t want to break what was working.

When the migration was complete, I did a brief checkout and then gave my server admin the green light. He flipped names & IPs around, and then I ran Repair-DbaServerName which I had just discovered a few days earlier. I was expecting to do it manually but I trust the dbatools crew and their test suite more than myself on this one as I’ve never done this before. When that was complete, I had a grand total of three issues (that I could find):

  • Database owners weren’t set appropriately. I was able to resolve this via Set-DbaDatabaseOwner easily enough.
  • Outgoing dbmail didn’t work. Turns out the SMTP relay on the new server wasn’t started. Easy fix.
  • I had a Linked Server on my production instance which was unable to communicate to the new test server. This took me the longest to figure out. We checked everything - SQL Server Configuration Manager, the network itself, and then finally my colleague suggested testing something outside SQL Server - mapping a drive from production to test. This last test succeeded, which pointed us at the SQL Server connection specifically. The root cause: I had two firewall rules on the new server that blocked connections from all but servers on the local subnet. The production server isn’t on the local subnet.

None of these are total showstoppers. I had workarounds (or quick solutions) for them and as this is a test instance we could live with minor inconvenience for a day or two. One or two final tests, and I was satisfied that everything was working properly so I went ahead and enabled my Agent jobs. Some of them still have incorrect owners but I can fix that later - they were wrong on the source instance too.

I consider this migration a huge success. We had 95% functionality by 9am. By 3pm, the last real problems were resolved (and only that late due to a series of meetings keeping me away from my desk). Most importantly, it was achieved with minimal downtime for the development and QA teams. I’m now one week post-migration and everything is still running smoothly on the new instance.