This is Part 2 of a series. Please see Part 1 for the background and more.Continue reading “Q&A: Dealing with Thousands of Databases (Part 2)”
This is part one of a three-part series.
I’ve mentioned in various places, including in blog posts on occasion, that my production SQL Server instance hosts several thousand (nearly 9000 as of this writing) databases. People are usually surprised to hear this and it often leads to interesting conversation.Continue reading “Q&A: Dealing with Thousands of Databases”
This is my fourth installment in a series responding to Steve Jones’s (blog | twitter) #SQLCareer challenge. I jotted down most of what I did through the day, filling a page and then some in a small notebook with timestamps and short reminders of what happened. For more, check out the #SQLCareer hashtag on Twitter.Continue reading “A Day in the Life (4/?) – August 2, 2019”
This is my third installment in a series responding to Steve Jones’s (b|t) #SQLCareer challenge. I decided to jot down most of what I did through the day, filling a page and a half in a Field Notes notebook with timestamps and short reminders of what happened. For more, check out the #SQLCareer hashtag on Twitter.
I chose to record this day because I was working from home as my car was in the shop and I thought I might get some bigger things done without the distractions of being in the office. But as Mike Tyson famously said, everyone has a plan until they get punched in the mouth.
06:00 – Alarm goes off but I’m already half-awake.
07:00 – Drive my son to school. This year he’s on a much earlier schedule than last year, and while I can’t walk him to school due to the distance and he can take the bus, driving him gives us some time to talk one-to-one and it gets me out the door and into the office earlier to boot. Earlier to work means I leave earlier, giving me more time to spend with my family in the late afternoon/evening.
07:30 – Return home, make a better breakfast for myself than Honey Nut Cheerios. Today it’s scrambled eggs with guacamole. If you haven’t tried it, you’re missing out.
08:00 – Set up camp in our “spare bedroom” which is trying to be a home office now.
08:10 – Log into the VPN and plug in the dongle for the wireless mouse. Windows spends far too long spinning on installing the driver and for the time being I use the built-in trackpad.
08:15 – Finish RDPing into my desktop and discover that SQL Server Management Studio (SSMS) has been restarted due to an update of some kind 45 minutes prior. Spend a bit of time recovering unsaved files and then saving them elsewhere.
08:20 – Give up on the mouse working, tell Windows to stop trying to install the drivers.
08:23 – For reasons I can’t explain, mouse starts working. Decide to take advantage of the interruption to my usual SSMS workflow and install the latest version 17.9. I haven’t seen reports of it blowing anything up in the week or so since it was released.
08:30 – Are we ready to work? I think so. Hop into the queue and take my morning cruise through email and SentryOne to review everything that ran overnight.
08:35 – The VPN seems to be really sensitive to other users of my home WiFi (my wife works from home regularly) so I use this as an excuse to hook up the 5-port switch that arrived from Amazon a couple days prior. We’ve got an ethernet drop in the room but it’s occupied by the VoIP box hooked up to our printer/scanner/fax machine. Yeah, a fax machine. She needs it for her job. But we’re out of electrical outlets in this corner of the room. Rummage around and find an unused power strip in the closet. It’s got 4 always on outlets plus 4 on a timer, and due to the size of the wall-warts for the VoIP box and switch, I need to use both sides. Spend more time than I care to admit figuring out how to program the timer (whose brilliant idea was this thing?). After achieving victory over both electricity and Ethernet, I reconnected my VPN after disconnecting from WiFi and got a more stable connection.
08:54 – Pull some data together for a product owner to document the conditions that caused one of their tools to trigger a half-dozen alerts from SentryOne. Turns out that if you attempt to back up the one database to the same filename in four separate processes simultaneously, three of them will get blocked!
09:03 – Get food for my daughter. I forgot to mention earlier that she was staying home from school due to illness.
09:10 – Back to the queue.
09:30 – Call into our weekly meeting for an upgrade project. Struggle to hear anything due to abysmal acoustics in the meeting room. They can hear every click of my mechanical keyboard over the phone.
10:00 – Pull some reports (Excel files) for the business side of the house. We need a couple years worth of data and our web-based reporting system can’t handle that volume so I have a PowerShell script that breaks it up into chunks and runs the stored procedure directly. I spent a bunch of time arguing with the script as I was trying to “improve” it from the previous iteration instead of just using what has worked in the past.
12:00 – Break off to sit in on the PASS Professional Development Virtual Group presentation “Talk Tech To Me – Improving Your Technical Presentation Skills” by Alexander Arvidsson (b|t).
13:15 – Make lunch
13:35 – Back to the reports I was working on in the morning. Ends up being a half-dozen Excel files, each in the neighborhood of 100MB, and approaching the limits of Excel’s capacity.
14:00 – Pick up my car from the shop.
14:30 – Resume work on another report. I had completed 90% of this one the previous day at work, but one calculation was twisting my brain. As written, the requirements are a little fuzzy and I keep flipping between two possible interpretations of them. I decide to commit to one interpretation and get my head around how to code for it.
16:00 – Check where we are with MinionWare‘s CheckDB. I rolled it out about 6 weeks ago and it’s been working mostly OK, but I’m still working out some issues specific to my environment. Discuss with one of our sysadmins when we’ll install some firmware updates for our servers in the coming weeks.
16:10 – Learn that I picked the wrong interpretation of the requirements for the afternoon report and I need to flip it around. Spend quite a while working that out and validating. Whiteboard it out around a doodle of my daughter’s (can’t erase that!).
17:21 – Realize that I need to get the non-production copy of Minion CheckDB in sync with production. Rather than move individual objects (I’ve been working with Sean for about 2 months on debugging some issues and I don’t have a 100% standard version), I do a backup & restore of the database to the test server. Since production is SQL Server 2008 R2 and test is 2016, I update the compatibility level and rebuild indexes, then decide to compress a couple tables as a test before doing the same in production just to keep things from getting too huge.
17:38 – Receive a text from my neighbor about a snake that’s standing between him and his grill in the backyard. It’s a long story. It should be noted that there are only three species of venomous snakes that call New York home, and none of them live in our neighborhood.
18:00 – Done for the day. Time to get the kids fed while my wife has the cat at the vet getting an unplanned checkup.
While I was working straight through the day, I didn’t really feel like I accomplished much. Usually on a remote work day I can start early and finish a little late, and still end up spending more time “at home” because I’ve eliminated the commute, but thanks to my hardware distractions to start the day, that wasn’t happening.
I finished off two sizable report requests and took care of a few pieces of administrivia, but that’s about it. Quantity vs. quality, I guess? Getting Minion CheckDB updated in my test environment seems to have put an end to the alerts it was triggering, which doesn’t impact others but it’s good for my quality of life.
While we do have a “home office” space, it’s not set up in a way that’s comfortable for me to work. The desk is all wrong, I only have the built-in display on the laptop, it needs a ceiling fan to help circulate air better, and the whiteboard is in an inconvenient place. If I were to be working regularly from home, this day’s experience pretty much seals what I already was feeling – that I need to build out a good workspace in the basement that fits me for both ergonomics and working style. That’s not a small or inexpensive undertaking, so I don’t see it happening in the near future. We have a lot of cleaning, planning and building to make that a reality and the only thing I know for sure is desk I’ll be getting.
This is my second installment in a series responding to Steve Jones’s (b|t) #SQLCareer challenge. I decided to jot down most of what I did through the day, filling a page and a half in a Field Notes notebook with timestamps and short reminders of what happened. For more, check out the #SQLCareer hashtag on Twitter.
I’m one of two DBAs in my company, and my colleague is (still) on holiday on the opposite side of the planet so I’m juggling everything – on-call, regular operations, consults with developers, you name it. In production, we manage several thousand databases which sit behind about as many websites.
I chose to record this day because it’s a huge departure from the usual routine. In addition to our bi-monthly software release, we had a quarterly event. Let’s check it out. I recommend reading the first installment to get a handle on some of the tasks & terms I might throw around here.
03:45 – Alarm goes off. It’s early, way too early, and compounded by the fact that I got to bed late-ish last night. We had unexpected dinner guests yesterday but it’s friends who we haven’t seen in three years and they were in town for one day only, so we weren’t going to pass up the opportunity.
04:10 – Hop in the car to get to the office.
04:12 – Nearly hit a deer bounding across the road before I even get out of the neighborhood.
04:50 – Arrive at the office, grab an RxBar (thanks to Drew (b|t) for tipping me off to them) and start getting set up for the deploy. This one’s pretty easy, I only have three changes I’m responsible for:
- Push a small data change out to all the databases
- Enable Read Committed Snapshot Isolation on one database
- Put a clustered primary key on one table in the above database.
05:00 – Red Gate Multi-Script is all set up with the database list and I hit the Go button
05:05 – Multi-Script is done!
05:09 – Enable RCSI & create the clustered PK in that one database.
05:50 – Kick off a data change across a couple dozen databases (via cursor this time, not Multi-Script).
06:05 – Kick off another data change across a couple dozen databases (via cursor).
06:30 – Notice that my installed copy of Brent Ozar Unlimited’s First Responder Kit is out of date by a good 6 months. Refresh it in production with
Install-DbaFirstResponderKit, but I’ve got a half-dozen test instances. Fortunately, they’re all registered with a Central Management Server so dbatools makes it even easier.
Get-DbaRegisteredServer -IncludeSelf -ServerInstance MYCMS | Install-DbaFirstResponderKit -Database master
06:48 – Into the queue. There’s a ticket or two that came in late Monday so I get to work on those.
07:15 – Breakfast has arrived! The company buys us a breakfast pizza and amazing donuts when we have software releases.
07:55 – Get to work trying to sort out an issue with a trigger on a critical table. My colleague and I have been ping-ponging this with our lead QA tester for a few weeks and I really want to get it finished as the trigger has been causing deadlocks.
08:45 – 09:50 – Bounce between the queue and the trigger a few times.
09:00 – Get word that one of the data changes I made earlier in the morning appears wrong. It turns out I did exactly what I was asked to do, but the original requestor transposed a couple digits when submitting the request. Fortunately, the changed records are easily identified and the request was otherwise well-documented, so I’m able to reverse my changes without restoring anything from a backup (my SOP is to make a backup immediately before any data change that isn’t trivial or logged in a history table).
09:20 – Pause to gawk at the crazy weather we’re getting. Not too bad in the city, but down in the Finger Lakes they’re getting rain measured in inches per hour and the flash flooding is intense.
09:50 – Break off to secure a spot in the common area for the quarterly presentation.
10:06 – Leave the presentation to address some blocking issues, and bring my laptop back with me so I can take care of others from there.
11:30 – The presentation is almost over and my ability to concentrate on the material is fading fast. I’ve been awake for 8 hours already after a short night’s sleep.
12:00 – Event wraps up and I take my chair back to my desk. Get registered for PASS Summit.
12:30 – Grab lunch. During the warmer months, the company brings a local food truck for lunch on the day we have this quarterly event but they don’t tell us what it is until the day of. Today’s truck – Tom Wahl’s and they’re dishing up Garbage Plates, a local delicacy. I get my once-every-five-years plate and dig in.
12:43 – Get a note that the reports documenting one of my data changes from 6 hours ago aren’t correct. Either the requirements are unclear to me or my brain is completely fuzzed at this point due to the schedule. I decide to shelve it until Wednesday as I’ll only make things worse at this point. I know the data’s good, I just need to get it documented.
13:00 – Kick off a database copy and subsequent data change across a couple hundred databases (via cursor). We do this at least a dozen times a year to get things set up for our partners and internal users.
13:15 – Poke around at a few databases in search of remaining heap tables and start pondering when I can get that fixed.
13:53 – Roll up my sleeves and get back to working out why Minion CheckDB works fine on my test instances, but throws security exceptions in production. Sean (t) has been working with me for quite a while on this and I can’t thank him enough for it. We’ve narrowed it down to something related to how the commands executed via
xp_cmdshell are authenticating to the instance. In this iteration, I’m throwing logging statements (writing to a table) around every call to
xp_cmdshell in hopes that I can pinpoint where the error is happening and exactly why.
15:00 – Ordinarily on a release day I’m out of the office by 14:00 but there’s no one else to hold down the fort and I’m making progress on this CheckDB thing. I want to make some good progress and document it before I leave. In the past hour I’ve inserted all my logging into the process and gotten a test run with results logged! Still failing but I’ve got enough information to reproduce the issue outside the confines of
xp_cmdshell and I’ve got some really good leads. I document my findings and ship them off to Sean.
15:15 – Just as I’m packing up my bags, a new ticket comes into the queue. It reads “if you have a chance, please run this today” but the deadline is Wednesday 9 AM. I talk to the submitter and tell him I can do it today if necessary, but would prefer Wednesday when my brain’s back online. He agrees to Wednesday.
15:20 – Head home.
16:05 – Arrive at home, say hi to the family. My wife tells me to go upstairs and take a nap. I feel guilty about it, but this is one of those difficult days as a production DBA, she’s very understanding, and I decide to compromise. I lay down in bed and watch most of The Dark Knight instead since it just got put on Netflix (I did try to nod off but it wasn’t happening).
18:00 – Back downstairs for dinner. As I’m still feeling the effects of lunchzilla, I skip the pasta and just have a couple meatballs.
19:00 – Pop into Slack real quick and notice that Friedrich Weinmann (b|t) has a new release of PSFramework. I tripped over a few issues with the logging functions last week and it turned out he was already working on them, so I was awaiting this release. I’ll have to update the module and check it out sometime Wednesday.
19:45 – Get the kids set for wind-down time then bed, and start writing this.
22:00 – After proofreading this, I realize that my phone has been uncharacteristically quiet tonight. Personal emails, texts, work alerts – I’ve received very, very little since 16:30. It’s unnerving, to be honest. I’m considering logging into work just to make sure everything is OK. Although I can see websites are online, so my instance is up and running at least.
Today was quite different from a normal day or even a normal release day. Due to the odd schedule and sleep cycle, I feel like I had a lot more trouble focusing today than what I would consider a “tough” day, to the point where I’d call myself “scatterbrained” today. For a while my colleague & I segmented our days pretty well – before 10 AM and after 2 PM was considered “work the queue” time and the middle of the day was reserved for larger projects and emergencies. I think I need to get back to this model once she returns.
I didn’t accomplish everything I should have today, including a test run or two of my demo for PowerHour which I was going to do with Matt Cushing (b|t). I’ll have to set something up with him later in the week. Wednesday is going to be very much a “write out a full task list and step through, driving each one to 100% completion before moving on” kind of day.
This is my first installment in (I hope) a series responding to Steve Jones’s (b|t) #SQLCareer challenge. I decided to jot down most of what I did through the day, filling a page and a half in a Field Notes notebook with timestamps and short reminders of what happened. For more, check out the #SQLCareer hashtag on Twitter.
I’m one of two DBAs in my company, and my colleague is on holiday on the opposite side of the planet (literally) for a couple weeks so I’m juggling everything – on-call, regular operations, consults with developers, you name it. In production, we manage several thousand databases which sit behind about as many websites.
00:48 – Get a handful of SentryOne alerts telling me that one of my test servers has rebooted. That’s unexpected. Will have to check that out in the morning. (Narrator: He forgot to do that in the morning)
05:34 – Wake up to the daily alert that the feed to one of our partners was successful. I really need to ask the developers to put me on the “only alert if failed” list. Go back to sleep.
06:00 – Alarm goes off, followed by the daily “daily integrity check job ran over 4 hours” email from SentryOne. Yes. I know. I’m working on a replacement for the job to make it run faster.
07:35 – Another report notification email. This one’s also my cue to pack up and get on the road.
08:30 – Arrive at work after a quick stop at Wegmans and plot my day. Decide to track my time today for the #SQLCareer challenge. All other plans are quickly abandoned as requests come in.
09:00 – Internal customer IMs me. I missed one step in the data movement/update script I ran for his team yesterday. Write an additional script to truncate that table across two dozen databases.
09:30 – En route to a meeting, I’m informed that one of our big sites isn’t loading. This hasn’t happened in a long time, but I’m pretty sure I know what it is. Fire up SentryOne while skimming my email – yep, I missed an alert for a blocking query that I should have killed a while ago. Usually these things self-correct within 2 minutes but this is a big database and we aren’t so lucky. Kill that spid (it’s just a
SELECT query) and everything clears up.
09:35 – Arrive at the weekly standup meeting fashionably late. Receive a flurry of questions via Slack from a developer about how to fix a problem with a small stored procedure he’s writing. Try to sort it out, but can’t do much with just my phone while I’m trying to concentrate on the conversation in the room.
10:30 – Meeting’s over, return to my desk to review/debug the stored procedure. It’s pretty basic, just needs to insert a record into a tracking table and then return the ID (an
IDENTITY) of the new record. I rewrote it to use the
OUTPUT clause rather than
INSERT and then call
select SCOPE_IDENTITY() immediately after the
INSERT, then while reviewing my changes with him we came to the conclusion that he can use an
INSERT right from the application code and skip the stored procedure altogether.
11:10 – Come to the realization that the usage of this new table may present a hotspot for the application and with that database currently using the
READ COMMITTED isolation level, we may see some slowdowns due to blocking. Note to self: switch this database to
READ COMMITTED SNAPSHOT on the next maintenance window. Switch test environment over to this isolation level now so that it gets exercised before we go live. Also discover that a “scratch” database that only my colleague and I ever use is about 75% empty. We can reclaim double-digit gigabytes so yes, I shrank the database.
11:20 – Look for more lost space in the system and find some transaction logs that got blown out of proportion due to one-off operations. Clean that up and get a few more gigabytes back.
12:00 – Into my ticket queue! This is how we get most of our “tactical” work – data moves, updates spanning multiple databases, report requests, drop databases, etc.
12:30 – Resume the conversation with my developer and remind him that we aren’t going to use
READ UNCOMMITTED for his new functionality.
12:45 – Talk to someone from SentryOne about a question I posted to the support forums. Conversation continues for about 90 minutes off and on as he researches how this feature works and how I can make better sense of what it’s telling me.
13:00 – Finally pause to eat. It’s the last of the brisket I smoked on Sunday. OK, not great. I just got a smoker a month ago and I’m still learning the ropes. I’m approaching it slowly, just like the cooking.
13:30 – With one of our system admins, debug the PowerShell script I wrote to pull some error logs out of tables and into flat files for ingestion by a log analysis tool. Turns out my “rolling 24 hours” of log retention didn’t clean up a couple files from last week and it’s still pulling those in over and over again.
13:35 – Back to the queue!
14:45 – Email someone about a potential issue with a service we depend upon to maintain our SLAs. I may have to find a creative way to validate that this service is working properly so that I can start trusting it again.
15:20 – Back to the error logs issue. My script is working in production. It works when I run it manually in test. When I run it via Task Scheduler, it does nothing. Looks like permissions on a directory. I bang my head against the wall for 25 minutes but it’s only a test server – I can get back to it in the morning.
15:45 – Back to the queue! Maybe I can knock out at least the first pass of this report that was requested yesterday.
16:30 – Time to go home, got a couple chores & errands to take care of.
20:15 – Settle down on the porch to sift through some personal email. Turns out I’m first on standby for the inaugural PowerHour so I better start rehearsing my talk. Start writing this blog post.
21:57 – PowerHour update. I’m definitely talking on August 21st.
23:31 – Enough is enough. Stop fretting over this post and just schedule it already!
This was a fun exercise and I thank Steve for proposing it. It made me more conscious of what I do throughout the day, without becoming obsessive over time-tracking. I notice that I didn’t get to do much if any “strategic” work (larger projects, researching system improvements, building automation to simplify future work) and instead did a bunch of tactical, reactive work. My next installment will probably be August 14th, as that’s our bi-weekly release day and is always a change from the usual routine. Hopefully in the course of doing these periodically, I’ll learn more about myself and how I’m spending my time at the office, finding ways to become more productive.
I really enjoy my job. I became a full-time production DBA about 14 months ago and it has been an overwhelmingly positive move. I work for a good company and with a terrific group of people. Many days, I have to force myself to leave the office because I was so engrossed in a task and just didn’t want to set it aside.
But there’s something that not everyone might consider before taking on this job. If you have a partner, children, or both, taking a job as a production DBA is really a family decision.
Being on-call is potentially disruptive to your family schedule. And sleep schedules! My on-call rotation is two weeks on, two weeks off. In those two weeks, I have:
- The usual alerts that can come in anytime day or night, the emergency fixes when someone deletes something that shouldn’t be deleted, etc.
- A software release which requires that I get up at 3:45 AM once per rotation
- Monthly server patching at 2 AM, if it happens during my rotation
Many years ago, I had a job where I carried on-call responsibilities and it was rough. Lots of nights and weekends. Then I got a decade-long break. Before I took my current job, I discussed the on-call requirements with my spouse a bit before accepting. I didn’t want to subject her to that again without making sure that she was OK with it. She is a very light sleeper, so any chirp from the phone is likely to wake her up (by contrast, I once put my phone three inches from my head and slept through multiple personal email alerts).
This job has the potential to impact the whole family, in both small ways and large. Chris Sommer (blog|twitter) said one day in the SQL Community Slack that being a production DBA is kind of a blue-collar job. Shift work, etc. He makes a good point. I’ve adapted to the schedule and it’s not bad…for me.
But I’m not alone in the house and yes, everyone has had to adjust. Sleep has been lost. If an alert comes in overnight, my spouse wakes up too. We’ve scheduled family activities around the on-call schedule. Carried the work laptop all over creation “just in case.” Left the beach to handle urgent tickets. Skipped weekend morning outings. Stayed up late, got up early, missed dinner, or paused a movie to baby-sit a critical job or troubleshoot system issues.
It’s worth it though. After taking on the new role, my job security increased. My career security has increased. My work is more challenging, more interesting, and I have more autonomy than ever before. I look forward to going to work every day. I’m getting more involved in the SQL Server community. On average I’m getting home earlier than I used to, so I’m spending more time with the kids on weekdays. It hurts waking up at 3:45 AM once a month but I’m there to greet them when they get home from school.
Life is full of tradeoffs and compromises, and taking a job with on-call responsibilities involves a lot of those tradeoffs. Overall, it’s been a net win for me. Would I prefer to not have to deal with overnights and weekends? Who wouldn’t? But the positive changes that this job has meant for my career, my family, and myself make it worthwhile.