Pay off your technical debt by preferring API clarity to generation efficiency
I’ve built the technical aspects of my career on combining technologies from Microsoft, that are easy to sell into enterprises that require the confidence that comes from their extensive support contacts and huge market footprint, with open source technologies that steer the direction of technology ahead of the enterprise curve – eventually to be embraced by them.
Microsoft has always provided powerful tools for developers in their Visual Studio product line. They focus on providing more features than any other vendor, and also having the flexibility to allows developers to design their software with the patterns that they find make the most sense to them. Because of this, the community is full of discussion, and there are always new ways to combine their technologies together to do similar things – but with quite a bit of variance on the architecture or patterns used to get them done. It can be daunting as a new developer, or a new member of a team, to comprehend some of the architectural works of art that are created by well-intentioned astronauts.
After I learned my first handful of programming languages, I began to notice the things that were different between each of them. These differences were not logic constructs, but rather how easy or difficult it could be to express the business problem at hand. Few will argue that a well designed domain model is easier to code against from a higher level layer in your application architecture than a direct API on top of the database – where persistence bleeds into the programming interface and durability concerns color the intent of the business logic.
In recent years domain specific languages have risen in popularity and are employed to great effect in open source projects, and are just starting to get embraced in Microsoft’s technology stack. A domain specific language is simply a programming interface (or API) for which the syntax used to program in it is optimized for expressing the problem it’s meant to solve. The result is not always pretty – sometimes the problem you’re trying to solve shouldn’t be a problem at all due to bad design. That aside, here are a few examples:
- CSS – the syntax of CSS is optimized to express the assignment of styling to markup languages.
- Rake/PSake – the syntax of these two DSLs are optimized to allow expressing of dependencies between buildable items and for creating deployment scripts that invoke operating system processes – typically command-line applications.
- LINQ – The syntax of Language Integrated Query from Microsoft makes it easier to express relationship traversal and filtering operations from a .NET language such as C# or VB. Ironically, I’m of the opinion that LINQ syntax is a syntactically cumbersome way to express joining relationships and filtering appropriate for returning optimized sets of persisted data (where T-SQL shines). That’s not to say T-SQL is the best syntax – but that using an OO programming language to do so feels worse to me. However, I’d still consider its design intent that of a DSL.
- Ruby – the ruby language itself has language constructs that make it dead simple to build DSLs on top of it, leading to its popularity and success in building niche APIs.
- YAML – “Yet another markup language” is optimized for expressing nested sets of data, their attributes, and values. It’s not much different looking from JSON at first glance, but you’ll notice the efficiency when you use it more often on a real project if you’ve yet to have that experience.
Using a DSL leads to a higher cognitive retention of the syntax, which tends to lead to increased productivity, and a reduced need for tools. IntelliSense, code generation, and wizards can all cost orders of magnitude longer to use than to simply express the intended action using a DSL’s syntax when you’ve got the most commonly expressed statements memorized because the keyword and operator set it small and optimized within the context of one problem. This is especially apparent when you have to choose a code generator or wizard from a list of many other generators that are not related to the problem you’re trying to solve.
Because of this, it will reduce your cycle time to evaluate tools, APIs, and source code creation technologies based not on how much code your chosen IDE or command-line generator spits out, but rather the clarity in comprehension, and flexibility of that code once written. I am all for code generation (“rails g” is still the biggest game changer of a productivity enhancement for architectural consistency in any software tool I’ve used), but there is still the cost to maintain that code once generated.
Here are a few things to keep in mind when considering the technical cost and efficiency of an API in helping you deliver value to customers:
- Is the number of keywords, operators, and constructs optimized for expressing the problem at hand?
- Are the words used, the way they relate to each other when typed, and even the way they sound when read aloud easy to comprehend by someone trying to solve the problem the API is focused on? Related to this is to consider how easy it will be for someone else to comprehend code they didn’t write or generate.
- Is there minimal bleed-over between the API and others that are focused on solving a different problem? Is the syntax really best to express the problem, or just an attempt at doing so with an existing language? You can usually tell if this isn’t the case if you find yourself using language constructs meant to solve a different problem to make it easier to read. A good example is “Fluent” APIs in C# or VB.NET. These use lambda expressions for property assignment, where the intent of a lambda is to enable a pipeline of code to modify a variable via separate functions. You can see the mismatch here in the funky syntax, and in observing the low comprehension of someone new to the concept without explanation.
- Are there technologies available that make the API easy to test, but have a small to (highly preferred) nonexistent impact on the syntax itself? This is a big one for me, I hate using interfaces just to allow testability, when dependency injection or convention based mocking can do much better.
- If generation is used to create the code, is it easy to reuse the generated code once it has been modified?
You’ll notice one consideration I didn’t include – how well it integrates with existing libraries. This is because a DSL shouldn’t need to – it should be designed from the ground up to either leverage that integration underneath the covers, or leave that concern to another DSL.
When you begin to include these considerations in evaluating a particular coding technology, it becomes obvious that the clarity and focus of an API is many times more important than the number of lines of code a wizard or generator can create to help you use it.
For a powerful example of this, create an ADO.NET DataSet and look at the code generated by it. I’ve seen teams spend hours trying to find ways to backdoor the generated code or figure out why it’s behaving strangely until they find someone created a partial class to do so and placed it somewhere non-intuitive in the project. The availability of Entity Framework code first is also a nod towards the importance of comprehension and a focused syntax over generation.