SQL New Blogger Challenge Weekly Digest

Page content

Watching all of the tweets as people posted their first entries in the SQL New Blogger Challenge earlier this week, I quickly realized that keeping up was going to be a challenge of its own. Fortunately, there are ways to reign it in.

My first stop was IFTTT (If This Then That). IFTTT allows you to create simple “recipes” to watch for specific events/conditions, then perform an action. They have over 175 “channels” to choose from, each of which has one or more triggers (events) and actions. I have IFTTT linked to both my Google and Twitter accounts, which allowed me to create a recipe which watches Twitter for the #sqlnewblogger hashtag, and writes any tweets that match it to a spreadsheet on my Google Drive account (I’ll make the spreadsheet public for now, why not?).

The next step is to export the spreadsheet to CSV. I don’t have this automated, and may not be able to (I may have to find another workaround). Once it’s a CSV, I can go to PowerShell to parse my data. I want the end result to be an HTML table showing each post’s author (with a link to their Twitter stream) and a link to the post (using the title of the post itself).

Once I import the CSV file into an object in my PowerShell script, I need to do some filtering. I don’t want to be collecting all the retweets (posts starting with RT), and I should probably exclude any post that doesn’t contain a URL (looking for the string HTTP).

To extract the post URLs, I ran a regular expression against each tweet. Twitter uses their own URL shortener (of course), which makes this pretty easy - I know the hostname is t.co, and after the slash is an alphanumeric string. The regex to match this is fairly simple: [http|https]+://t.co/[a-zA-Z0-9]+

Then, for each URL found in the tweet, I use Invoke-WebRequest to fetch the page. This cmdlet automatically follows any HTTP redirects (I was afraid I’d have to do this myself), so the object returned is the real destination page. Invoke-WebRequest also returns the parsed HTML of the page (assuming you use the right HTTP method), so I can extract the title easily instead of having to parse the content myself. It’ll also give me the “final” URL (the destination reached after all the redirects). Easy!

My full script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 #requires -version 3 [cmdletbinding()]
 param ()
 set-strictmode -Version latest;
 Add-Type -AssemblyName System.Web;
 $AllTweets = import-csv -path 'C:\Dropbox\MiscScripts\Sqlnewblogger tweets - Sheet1.csv' | where-object {$_.text -notlike "RT *" -and $_.text -like "*http*"} | select-object -property "Tweet By",Text,Created | Sort-Object -property created -Unique;
 $TweetLinks = @();
 foreach ($Tweet in $AllTweets) {
   $Tweet.text -match '([http|https]+://t.co/[a-zA-Z0-9]+)' | out-null;
   foreach ($URL in $Matches) {
     $MyURL = $URL.Item(0);
     # Invoke-WebRequest automatically follows HTTP Redirects. We can override this with -MaxRedirection 0 but in this case, we want it!
     $URLCheck = Invoke-WebRequest -Method Get -Uri $MyUrl;
     $OrigUrl = $URLCheck.BaseResponse.ResponseUri;
     write-debug $Tweet.'Tweet By';
     Write-debug $URLCheck.ParsedHtml.title;
     write-debug $URLCheck.BaseResponse.ResponseUri;
     $TweetLinks += new-object -TypeName PSObject -Property @{"Author"=$Tweet.'Tweet By';"Title"=$URLCheck.ParsedHtml.title;
     URL"=$URLCheck.BaseResponse.ResponseUri;
     };
    }
  }
  Write-debug $TweetLinks;
  $TableOutput = "<table><thead><tr><td>Author</td><td>Post</td></tr></thead><tbody>";
  foreach ($TweetLink in $TweetLinks) {
    $TableOutput += "<tr><td><a href=""https://twitter.com/$($TweetLink.Author.replace('@',''))"">$($TweetLink.Author)</a></td><td><a href=""$($TweetLink.URL)"">$([System.Web.HttpUtility]::HtmlEncode($TweetLink.Title))</a></td></tr>";
    }
  $TableOutput += "</tbody></table>";
$TableOutput;

And now, my digest of the first week of the SQL New Blogger Challenge. This is not a complete listing because I didn’t think to set up the IFTTT recipe until after things started. I also gave people the benefit of the doubt on the timing (accounting for timezones, etc.) and included a few posted in the early hours of April 8th. For week 2, it will be more complete.

Author Post
@eleightondick Kevin Kline on Twitter: “Advice to New Bloggers http://t.co/o1jfLOR4QI
@BarbiducH Safe exit from WHILE loop using ##global temp tables | One developer’s SQL blog
@sqlmd Mike Donnelly on Twitter: “T-SQL Tuesday #065 - Teach Something New http://t.co/LoyFbhVOpw #tsql2sday”
@GuruArthur Arthur BaanTrace flags in SQL Server - Arthur Baan
@cjsommer Blogging and Intellectual Property Law | legalzoom.com
@ceedubvee A Insider’s View of the Autism Spectrum: Autism and Information Technology: Answering a Blog Challenge (Plus, Why I Like Data)
@arrowdrive Anders On SQL: A bit about me continued. Anders meets SQL
@SQLJudo Experience Is Overated | Russ Thomas - SQL Judo
@SQLBek T-SQL Tuesday #065 – Teach Something New | Mike Donnelly, SQLMD
@MtnDBA #SQLNewBlogger Week 1 “Eye of the Tiger” | DBA With Altitude
@Lance_LT The estimated query plan and the plan cache (Part 1) | Lance Tidwell the Silent DBA
@AalamRangi How to Use Temp Table in SSIS | SQL Erudition
@DwainCSQL An Easter SQL

Limitations

There are a couple limitations and gotchas with this process:

  • The IFTTT recipe only runs every 15 minutes (all IFTTT triggers run on 15 minute intervals) and only fetches 15 tweets each time it runs (again, IFTTT’s configuration). So if there’s a flood of tweets, they won’t all be captured.
  • I don’t really check the final destination of a link. For example, one of the first tweets captured contained a link to another tweet, which then linked to a blog post. Could I detect this & resolve the true final destination? Probably. But it’d involve a lot more logic, and it’s getting late.
  • I also don’t pick up every duplicate link/post. Again, I can probably get by this with some extra logic, but I don’t think it’s necessary right now.
  • It doesn’t post automatically to my blog, or anywhere else. I have to manually paste the HTML into my blog post(s).
  • I had to manually remove one link as it didn’t actually point to a post written for the challenge; it was tweeted with the hashtag, so my IFTTT recipe caught it.
  • I started collecting these tweets mid-day April 7th. If you posted before that, I’ve likely missed your post. You will be picked up for Week Two!