A Day in the Life (1/?) - August 7, 2018

Page content

This is my first installment in (I hope) a series responding to Steve Jones’s (blog|twitter) #SQLCareer challenge. I decided to jot down most of what I did through the day, filling a page and a half in a Field Notes notebook with timestamps and short reminders of what happened. For more, check out the #SQLCareer hashtag on Twitter.

Background

I’m one of two DBAs in my company, and my colleague is on holiday on the opposite side of the planet (literally) for a couple weeks so I’m juggling everything - on-call, regular operations, consults with developers, you name it. In production, we manage several thousand databases which sit behind about as many websites.

My Day

00:48 - Get a handful of SentryOne alerts telling me that one of my test servers has rebooted. That’s unexpected. Will have to check that out in the morning. (Narrator: He forgot to do that in the morning)

05:34 - Wake up to the daily alert that the feed to one of our partners was successful. I really need to ask the developers to put me on the “only alert if failed” list. Go back to sleep.

06:00 - Alarm goes off, followed by the daily “daily integrity check job ran over 4 hours” email from SentryOne. Yes. I know. I’m working on a replacement for the job to make it run faster.

07:35 - Another report notification email. This one’s also my cue to pack up and get on the road.

08:30 - Arrive at work after a quick stop at Wegmans and plot my day. Decide to track my time today for the #SQLCareer challenge. All other plans are quickly abandoned as requests come in.

09:00 - Internal customer IMs me. I missed one step in the data movement/update script I ran for his team yesterday. Write an additional script to truncate that table across two dozen databases.

09:30 - En route to a meeting, I’m informed that one of our big sites isn’t loading. This hasn’t happened in a long time, but I’m pretty sure I know what it is. Fire up SentryOne while skimming my email - yep, I missed an alert for a blocking query that I should have killed a while ago. Usually these things self-correct within 2 minutes but this is a big database and we aren’t so lucky. Kill that spid (it’s just a SELECT query) and everything clears up.

09:35 - Arrive at the weekly standup meeting fashionably late. Receive a flurry of questions via Slack from a developer about how to fix a problem with a small stored procedure he’s writing. Try to sort it out, but can’t do much with just my phone while I’m trying to concentrate on the conversation in the room.

10:30 - Meeting’s over, return to my desk to review/debug the stored procedure. It’s pretty basic, just needs to insert a record into a tracking table and then return the ID (an IDENTITY) of the new record. I rewrote it to use the OUTPUT clause rather than INSERT and then call select SCOPE_IDENTITY() immediately after the INSERT, then while reviewing my changes with him we came to the conclusion that he can use an INSERT right from the application code and skip the stored procedure altogether.

11:10 - Come to the realization that the usage of this new table may present a hotspot for the application and with that database currently using the READ COMMITTED isolation level, we may see some slowdowns due to blocking. Note to self: switch this database to READ COMMITTED SNAPSHOT on the next maintenance window. Switch test environment over to this isolation level now so that it gets exercised before we go live. Also discover that a “scratch” database that only my colleague and I ever use is about 75% empty. We can reclaim double-digit gigabytes so yes, I shrank the database.

11:20 - Look for more lost space in the system and find some transaction logs that got blown out of proportion due to one-off operations. Clean that up and get a few more gigabytes back.

12:00 - Into my ticket queue! This is how we get most of our “tactical” work - data moves, updates spanning multiple databases, report requests, drop databases, etc.

12:30 - Resume the conversation with my developer and remind him that we aren’t going to use READ UNCOMMITTED for his new functionality.

12:45 - Talk to someone from SentryOne about a question I posted to the support forums. Conversation continues for about 90 minutes off and on as he researches how this feature works and how I can make better sense of what it’s telling me.

13:00 - Finally pause to eat. It’s the last of the brisket I smoked on Sunday. OK, not great. I just got a smoker a month ago and I’m still learning the ropes. I’m approaching it slowly, just like the cooking.

13:30 - With one of our system admins, debug the PowerShell script I wrote to pull some error logs out of tables and into flat files for ingestion by a log analysis tool. Turns out my “rolling 24 hours” of log retention didn’t clean up a couple files from last week and it’s still pulling those in over and over again.

13:35 - Back to the queue!

14:45 - Email someone about a potential issue with a service we depend upon to maintain our SLAs. I may have to find a creative way to validate that this service is working properly so that I can start trusting it again.

15:20 - Back to the error logs issue. My script is working in production. It works when I run it manually in test. When I run it via Task Scheduler, it does nothing. Looks like permissions on a directory. I bang my head against the wall for 25 minutes but it’s only a test server - I can get back to it in the morning.

15:45 - Back to the queue! Maybe I can knock out at least the first pass of this report that was requested yesterday.

16:30 - Time to go home, got a couple chores & errands to take care of.

20:15 - Settle down on the porch to sift through some personal email. Turns out I’m first on standby for the inaugural PowerHour so I better start rehearsing my talk. Start writing this blog post.

21:57 - PowerHour update. I’m definitely talking on August 21st.

23:31 - Enough is enough. Stop fretting over this post and just schedule it already!

Thoughts

This was a fun exercise and I thank Steve for proposing it. It made me more conscious of what I do throughout the day, without becoming obsessive over time-tracking. I notice that I didn’t get to do much if any “strategic” work (larger projects, researching system improvements, building automation to simplify future work) and instead did a bunch of tactical, reactive work. My next installment will probably be August 14th, as that’s our bi-weekly release day and is always a change from the usual routine. Hopefully in the course of doing these periodically, I’ll learn more about myself and how I’m spending my time at the office, finding ways to become more productive.