Branden Schwartz

Saving Time with F# Type Providers

One of the ongoing projects my team has been working on at the office is an effort to get A records created for all of the databases on our internal servers to make maintenance easier. Once every DB has a DNS name associated with it and the configurations for all of the applications that use them have been updated to use those in their connection strings, migrating databases from one server to another will be a non-event. Theoretically at least. Instead of finding every occurrence of a database and updating the connection string in every config or other or database or wherever else from server A to server B for a single application, then rinsing and repeating for everything that touches it, a world with a DNS name for each database is a world where all of those connections can be fixed with a single tweak to an A record and an ipconfig /dnsflush.

There are a lot of servers in our ecosystem and each of those typically has more than two dozen databases. There are some exceptions that hold only a handful, for reasons ranging from security policies to performance needs and disk space considerations, but the last official count I got of DNS names we will eventually need was in the neighborhood of “a lot”. In the biggest round we’ve created at one time so far, there were something like 160 databases that needed names. The names themselves are uninteresting once the convention is established, a list filled with {{Database Name}}DB.{{domain}}. Each DB typically lives in a development, acceptance, and production environment so we create three A records for each DB. Developers don’t have access to make those kinds of changes to the environment in my world so we put together a list of what we needed, fired off a change request, and waited. If we had any desire to spot check the list once the work was finished and before releasing it to the rest of the organization to start updating the related applications we had something of an uphill climb ahead of us. 160 X 3 = much closer to 500 connections than I’d like to test by hand.

That of course raises the obvious question, so I’ll address it in hopes of saving everyone some sleepless nights: Yes, we have interns, but I actually like them.

When I started my first “real” job in the field shortly after finishing college, I remember a coworker saying something to the effect of “good programmers are somewhat lazy by nature. They’ll spend hours and days and sleepless nights on the interesting problems because they think it’s fun, but give them a simple boring task that needs to be repeated a lot of times and they’ll drop everything in search of a way to automate it to make sure they’ll never have to do it again.” It was in that spirit that I found myself too lazy to verify even 10% of the entries by hand. Firing up SQL Management Studio, connecting to each server with the correct credentials or my Active Directory account, and confirming that the first entry in the list connects to the database that the DNS name implies, move to the next server, disconnect, move to the next item in the list…ugh! I briefly considered copying the list of DNS names into a batch script that pings them, but a) that wouldn’t confirm the existence of a SQL Server installation on the machine, just that it existed, b) even if we could assume that the servers were all SQL boxes it wouldn’t confirm that the correct database was present, and c) not all of the servers on the network are even pingable. A second after discarding that I considered throwing together a C# console application to connect to each one in turn and fire off a USE {{DATABASE}} command to check that the expected DB shows up on the server, but before I could consciously talk myself out of that I realized that the perfect answer involved F# type providers.

Because type providers give compile-time checking that ensures a there are no surprises by the time an application runs we know that any code we write that builds properly is going to work at runtime, at least with respect to the database. In this case there wasn’t a particular object that we needed to touch in each one; we simply needed to confirm that the DNS names were pointing to servers that held the DBs we expected. In hopes of a quick gamble paying off, I threw together a SELECT statement against the table where we had stored all of our database information and used it to generate a type provider declaration for each of the DNS entries we wanted to test. Fortunately all of our developer Active Directory accounts at least have read access to all of the SQL Servers, so all of the declarations take a form similar to the code snippets below.

F# type providers generated from a T-SQL query

After all of the connections were generated, I copied and pasted them in to the F# project and built it. There was no need to include additional code that would actually use the types for anything because the type providers will check that both the server and database are accessible at compile time, so the only other code is the main method that comes with a new console application.

F# console application default main method

I received handful of error messages because the servers were running versions of SQL before 2012 and the type provider was incompatible with those earlier versions. That gave us a list of names that we could guess were correct because we knew those DBs were older and likely to live on older SQL boxes but we needed to exercise a little due diligence to confirm our suspicions. The rest of the errors indicated that either the DNS entries were wrong or simply didn’t exist as expected. In less time than it took to explain what I had done to my coworkers who were unfamiliar with F#, I had tested every single connection and found the subset of entries that needed more attention. The crowning achievement was of course when the most anti-functional developer on the team admitted that it was a “pretty neat approach” to a seriously tedious task.

From here, my next step is just to convince everyone in my ecosystem that we should all be writing F# all the time for basically everything. Timeline TBD.

Tagged with: , , , ,

Stupid Markup Tricks

A coworker asked me to look over a proof of concept project he’d been working on the other day and, after telling him my thoughts, I added “of course it needs more blink tags and marquees” because it’s impossible for me to go for very long without ribbing him about something and markup is as good a target as anything.

I didn’t realize that, along with all of my pop culture references, he also completely missed the era of web pages that made liberal use of those tags. Lucky for him, I suppose. I put together a quick page to demonstrate that neither of those tags were recognized by modern browsers. I was relieved when <blink> didn’t work but actually a little surprised when the <marquee> worked because I hadn’t used it in years and then not for anything good. I laughed, threw the browser window over to my third monitor after conceding that yes, marquees could still be useful for “news ticker”-like behavior, and forgot about it as I got lost in something I needed to troubleshoot. I didn’t think about it until someone else wandered by and said “seriously? why would you do that?”

After a quick joke about introducing blinks and marquees into every one of our web apps, someone wondered if form fields had ever been nested in a marquee. I said that I imagined it would ignore the tags at best, render some unidentifiable HTML abomination at worst. A few keystrokes later and there was a fully functional text field that just happened to be scrolling across the screen. The same person thought that nested marquee tags might be interesting. I was skeptical but did it anyway. I have absolutely no idea when either of those would have any chance of being remotely useful except as some extremely niche developer party trick, but there it is.

We do these things because we can, apparently.

nested marquee

double nested marquee

SQL Server IIF: Syntactic Sugar that Doesn’t Rot Your Brain

It wasn’t until recently that I discovered a feature that I’ve missed since SQL Server 2012.  One of my recent projects involved converting a lot of old Visual FoxPro code.  That’s not quite as much fun as it sounds, as it turns out.  When things like ISNULL and. .F. and At and Atc behave differently or have no direct translation at all in from environment to the other hurt my head a little.  I expected the same level of annoyance when I saw the original programs peppered throughout with IIF statements.

My mood was seriously soured after puzzling my way through nested Scatters and Gathers to mirror the intent of the original code.  I cringed when I saw the IIFs because I knew that I’d need to rewrite everything as a CASE statement the way one would expect in T-SQL. Copying and pasting the first query into a SQL Management Studio instance, I braced myself for the inevitable pain.

Except that it all worked.

That’s an exaggeration.  There were a few FoxPro concepts that didn’t translate over and some convoluted logic that made me question my career choice, but SQL Management Studio detected the IIFs as keywords.  It surprised me more than a little when they behaved exactly as they did in the original program.  Refactoring those (sometimes nested two and three deep) conditional statements to CASEs and making sure that the original intent was preserved in each case scared me a little, so that was a good thing.  I had syntax that was familiar, almost exactly matching VB.NET and close enough to the?: in C# and even IF in Excel that there was very little need to mentally adjust.  On the other, IIF has developed a reputation for making it easy to write conditions that are impossible to read.  I’ve seen some nested IIFs that can challenge a person’s sanity.

I admit to giving my pair programming partner a certain amount of flak for his preference for IIF statements.  I doubt he’d been in the field prior to 2012, so he never lived in a world without them.  Despite my grumbling there’s a strong argument in favor of using that approach over CASE in some instances.  I appreciate using IIF in situations with a single test, several simple conditions chained together, or copying large chunks of code from something like VB.NET or FoxPro and much of the logic can be preserved for free.  Despite my grumpiness, IIF(condition, true value, false value) has a certain elegance at times.

I’ve rejected the overuse of language features that smell like syntactic sugar for a long time.  Shortcuts can prevent developers from understanding what’s really happening.  I live in mortal fear of losing what understanding I already have.  With IIF, I’m less likely to draw an absolute line in the sand.  It can contribute to sensible code that flows better than the alternatives, is easier to read, and saves brainpower for the really difficult questions.

Named Values: The Discriminated Union Example that I Rarely if Ever See

Two things really attracted me to F# originally: type providers and discriminated unions. As I started working with the language I scoured the Web for all of the content I could find.  It wasn’t quite as plentiful as it is now but both F# for Fun and Profit and TryFSharp existed in some form before I began exploring in earnest so that was incredibly helpful. I’ve noticed a certain amount of commonality between the various presentations of the discriminated union across the internet.

We often see examples that employ shapes or cards or int*bool unions.  Those show off terse and powerful nature of the feature very well.  When discriminated unions grow beyond a small number of values, though, they can become headache-inducing very quickly. With a small number of values of different types involved, those discriminated unions can be memorized or thought through because when we’ve got a Person of string * DateTime * int we’re not going to mix up the name, date of birth, and SAT score or whatever because the other data types don’t make sense. In cases where there’s an accepted convention, like with geographic locations or three-dimensional coordinates, the fact that the unions contain the same type multiple times isn’t a problem because we can assume that the order of the values is latitude and longitude or x, y, and z. If developers follow those conventions consistently throughout the code, anyway.

Where neither of these cases hold, I used to think that my options ranged from “make liberal use of comments” to “don’t use discriminated unions”. Casual Googling fails to turn up any documentation, but I stumbled on a third option that I’m embarrassed that I didn’t think of immediately after writing “Hello World” and am quietly amazed hasn’t gotten more press: just name the values.

type Person =
| Student of string * DateTime * string * int*int*int*int*floatDiscriminated Union-Student

type PersonWithNamedValues =
| Student of Name:string * BirthDate:DateTime * Major:string * GraduatingYear:int*SAT:int*ACT:int*ClassRank:int*GPA:floatDiscriminated Union-StudentWithNames

It’s a small difference, and of course when we consume one of these types in a match expression we can assign arbitrary names to each of the values so there’s nothing world-changing about having sensible names from the outset, but there’s a certain amount of payoff during the declaration.

Call it nitpicky, but I tend to think that anything we can do that we can make our code more readable and easier for new people to understand quickly will be worth the time we invest, particularly if it helps us immediately as we’re writing it.

CROSS APPLY Keeps Me Sane

There hasn’t been a great deal of routine from year to year throughout my career.  Microsoft has moved from VB6 to progressively more advanced incarnations of the .NET framework and versions of SQL Server have added Intellisense and more features than I could have imagined when I first started working with 6.5.  CRUD is a constant, though, particularly the “R” when I’ve needed to display information in a UI or generate reports whether I own the application in question or I’m working with systems developed by third parties.  Like many developers I’ve found myself working with tables that weren’t designed 100% in the way that I’d like or that users want to see the values so there’s always a certain amount of transformation and finagling that happens between the tables and the results that I need.  Add requirements to filter and sort query results on arbitrarily-transformed fields and the SQL involved can be more than a little convoluted to write, decipher, and maintain.  Enter CROSS APPLY.
I once inherited a report that was intended to calculate the average amount of time that it took for administrative users to complete their assigned tasks in a particular system. The basic structure was more or less sensible, making use of a derived table to get the aggregates for each user (simplified for brevity):
SELECT t.FullName, t.TotalTasks,
---additional fields
FROM
(
SELECT ul.FullName,
SUM (CASE WHEN ut.task_completed = 1 THEN 1 ELSE 0 END) CompletedTasks,
AVG(DATEDIFF (minute, ut.create_timestamp ,ut.complete_timestamp)) AverageMinutesToComplete
FROM dbo.user_list ul
LEFT OUTER JOIN dbo.user_tasks ut ON ut.owner_id= ul.[user_id]
WHERE ut.create_timestamp >= @startDate AND ut.create_timestamp <= @endDate
) t

Since the task table stores the creation and completion dates without the overhead of an additional column to hold the calculated “time to complete”, the only way to get the information we’re after is to perform the calculation in the moment for the smallest unit of time that we want to see and then build from there whenever we want to add hours, days, or weeks.

Seeing DATEDIFF used to get the average number of minutes it takes each user to complete their assigned tasks, it seems logical that we’d use the same approach to get those other units as they’re needed and then do the math so that instead of “1 hour and 65 minutes” the report should show “1 hour and 5 minutes”, “25 hours and 5 minutes” should be “1 day, 1 hour, and 5 minutes”, and so on.  Instead, the original query used a slightly different method for getting those values.

SELECT t.FullName, t.TotalTasks,
CASE WHEN
CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END > 0 THEN CAST(FLOOR (t.AverageMinutesToComplete/(24 *60.0)) as varchar(3)) + ' day'
+
CASE WHEN FLOOR (t.AverageMinutesToComplete/(24 *60.0)) >= 2 THEN 's ' ELSE ' ' END
ELSE '' END
AS AverageDaysToComplete,
CASE WHEN
CASE WHEN (t.AverageMinutesToComplete - ((CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * ( 24*60 ))) >= 60
THEN CAST (FLOOR(( t.AverageMinutesToComplete - ( (CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * (24* 60)) )/60.0) as varchar(2))
ELSE 0 END > 0 THEN
CAST
(
CASE WHEN (t.AverageMinutesToComplete - ((CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * ( 24*60 ))) >= 60
THEN CAST (FLOOR(( t.AverageMinutesToComplete - ( (CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * (24* 60)) )/60.0) as varchar(5 ))
ELSE 0 END
AS varchar(2)) + ' hour' +
CASE WHEN
CASE WHEN (t.AverageMinutesToComplete - ((CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * ( 24*60 ))) >= 60
THEN CAST (FLOOR(( t.AverageMinutesToComplete - ( (CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * (24* 60)) )/60.0) as varchar(2))
ELSE 0 END  >= 2
THEN 's ' ELSE ' ' END
ELSE '' END
AS AverageHoursToComplete,
---remainder of query

While those two statements handle the requirements that the report will display the time to completion as “2 days, 1 hour, and 5 minutes” by making sure that each full day’s worth of minutes is subtracted from the total before calculating the number of hours and then going on in a third statement to adjust for the full days and hours to find the number of minutes remaining, I find it much more difficult to read and hold in my head than some of the other options.

Using DATEDIFF to get the average number of days and hours before applying the math to display the values realistically would require less typing and make more sense than the massive amount of text above, but for the purposes of this example it can serve as a placeholder for transformations that are necessarily complex and don’t have alternatives built into T-SQL.  Taking the CROSS APPLY approach to shift those calculations out of the query’s field list and into the FROM clause alongside the tables and derived tables.

SELECT
--field list,
CASE WHEN days.AverageDaysToComplete >= 1 THEN
    CAST(days.AverageDaysToComplete as varchar(3))  + 'day' +
         CASE WHEN days.AverageDays >= 2 THEN 's ' ELSE ' ' END AverageDaysToComplete,
CASE WHEN hours.AverageHoursToComplete >= 1 THEN
    CAST(hours.AverageHoursToComplete as varchar(2))  + 'hour' +
         CASE WHEN hours.AverageHoursToComplete >= 2 THEN 's ' ELSE ' ' END AverageHoursToComplete ,
FROM
--derived table
CROSS APPLY (SELECT CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END AS AverageDaysToComplete ) days
CROSS APPLY (SELECT CASE WHEN (t.AverageMinutesToComplete - (days.AverageDaysToComplete * ( 24*60 ))) >= 60
THEN FLOOR(( t.AverageMinutesToComplete - (days.AverageDaysToComplete * (24* 60)) )/60.0))
ELSE 0 END AS AverageHoursToComplete) hours

In this instance , there’s no real performance gain when using the CROSS APPLY approach vs. leaving all of the heavy lifting inline with the rest of the fields in the SELECT but there are other instances where it may outperform a correlated subquery.  Apart from that, declaring those values once and treating them as columns that can be used anywhere throughout the query without the need to recalculate them every time they’re needed can prevent typos, logical errors, carpal tunnel, and migraines in this and numerous other cases.

The System I Don’t Love, Exactly

There’s a particular piece of enterprise software that we use at the office of which I am none too fond.  As a member of a small technical team, there are a number of things I find myself doing that don’t fit squarely into the “developer” wheelhouse (“hey, you’re IT aren’t you?  come fix this printer”) , and I typically consider that more pro than con.  There’s a perspective and breadth of experience with various concepts that one just can’t acquire doing heads-down coding 100% of the time, and I like to think that the variety keeps me from getting so comfortable with a small number of things that I fall into a rut without seeing it coming.  On occasion, however, it seems that someone has taken the time and the energy to prefabricate a rut for me to fall into with very little effort on my part.  Sometimes that rut can become a ditch.  Or a bottomless pit of despair.

Before I address the particulars of the application that makes me sad whenever I touch it, I should point out that it is not the first of its kind and quite possibly isn’t even the worst I’ve seen.  In every position I’ve held from the time I got out of undergrad and set up my desk at my first “real job” in the field, there has been by definition one program I have enjoyed working with less than anything else.  Sometimes I’ve been part of the development team for a project with a less than stellar future so I’ve stared into the abyss of subpar spaghetti code and felt the tendrils of anti-logic boring their way into my soul.  Others, it’s been a company standard application that everyone has had the pleasure of using on a daily basis despite the fact that for many people it was the square peg that was pounded relentlessly into the very round hole of the requirements for which it was originally purchased.  Having both experiences more than once in my career leaves me on the fence when trying to decide which is worse; with an internal project, the ability to see and touch the inner workings of the thing means that on a long enough timeline there’s a reasonable chance of squashing most of the major bugs or begging the people in charge to authorize a rewrite.  On the other hand, bugs in vendor-provided software are officially Somebody Else’s Problem and the reflexive facepalms at finding painfully counter-intuitive behavior stops at the point discovery because only the vendor knows how offensive the source really is.

In this particular instance, the application is produced from a vendor who deals exclusively in ,one specific problem domain and is required to pass an extensive series of regulatory hurdles in order to sell the product.  The thorough vetting process does not, for reasons I cannot begin to fathom, include a check for foreign key relationships between tables or other relational database principles that I used to take for granted were a given in any well-ordered database design.  The total number of tables reaches into the five digit range, and if a primary key in one table makes an appearance in three others it the chances are good that it will go by at least two different names.  There are multiple blob fields storing documents of various kinds and those same files are replicated in at least one location in the filesystem.  New versions and hotfixes contain DDL scripts on a semi-regular basis and in the releases I’ve observed those commands seem to change field names or remove them as often as they add objects, so any reports or data transfers that were built against the existing schema may or may not continue to work with newer versions.  This is of course not a very serious problem as long as no one needs to pull information of any kind out of the database on a consistent basis for any reason.

The user interface is completely intuitive and navigation is straightforward enough in parts, but other areas are significantly less so.  Month, day, and year textboxes appear on screens next door to forms that use date pickers, date pickers occasionally get caught in infinite loops that require a date value to break out of but don’t recognize anything that’s been entered and users are forced to kill the process then restart, and the same data can be presented in a completely different fashion between one part of the environment and another.  It was saddening but not completely surprising when I learned that some of the database calls from the application are constructed in the code and use ordinal parameters instead of explicitly naming them but others use stored procedures with what I consider sensible naming conventions.  An end user recently remarked to me that it seemed like someone with a split personality had designed the whole thing and that one of those personalities did not like the people who use their software very much at all.

There are a few reasons to share this beyond the desire to vent and give myself the ultimate one-up at any watercooler conversation about programs that might be less than ideal and who has it worse.  It also provides the opportunity for everyone else to indulge in a serious amount of schadenfreude and reflect that the system that they built or have to use or maintain may not really be all that bad.  This also gives me a baseline to refer back to whenever I discuss one of the many tools and processes that  I find myself building to address pain points in this or any system I don’t love, exactly.  Maybe I’ll call it SIDLE for short.

Top