^{Jonathan Bartlett

May 26, 2020

6

Programming}

Automated Code Generation Tools Can Solve Problems

_{We may be seeing the rebirth of an old approach to productivity that finds a middle ground between too constrained and too risky} _{Jonathan Bartlett

May 26, 2020

6

Programming}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Over the years I’ve gotten to work with a lot of different programming languages. I grew up on Basic and 6502 machine code. I learned Pascal in middle school and C++ in high school. I learned Perl, Scheme, Cobol, and Fortran in college. I’ve written books on x86 assembly language, JavaScript, and PHP. My day job has had me writing Ruby on Rails, Swift, Objective C, and C#. So what have I learned?

A programming language serves two basic purposes. First, it improves the productivity of the programmer and second, it makes the program more understandable by creating a middle space between the way humans think and the way computers think.

Historically, improving programming productivity has taken one of two routes: Either force programmers to be more explicit about what they are doing, allowing the computer itself to check a large portion of the validity of the code. Or allow programmers considerable freedom so they can get systems to work with the minimum number of essential keystrokes and commands.

Typically, these two methodologies have been at odds with each other. Languages such as C# enforce strict “type safety,” meaning that the computer can check your code ahead of time to be sure that you didn’t make any obvious mistakes. Additionally, in languages such as C#, which are compiled into machine-readable code ahead-of-time, this information can further improve the speed of the code.

Other languages have gone a different route. JavaScript and Ruby, for instance, have no type safety built-in. This means that the programmer is more free to do things in the way that makes the most sense for the program, even if it prevents the computer from checking ahead-of-time. As an example, Ruby’s ActiveRecord system will auto-create methods and functions you based on the structure of your database as it exists when you start your program. In fact, it can actually auto-create methods on-the-fly as you call them!

Ruby has a mechanism which allows failed method calls to be caught, inspected, and auto-created while the code is running. This feature allows considerably more productivity (you can literally call into existence certain functions that you wish were there) but it also limits the computer’s ability to check for validity ahead-of-time. It’s hard enough to write a program checker to make sure that you call existing functions correctly; it’s nearly impossible to write one that also makes sure that you call non-existent functions correctly.

Some middle grounds have been found. Type-safe languages have made declaring statically typed variables easier by auto-inferring the type from the context. Dynamic languages have made inroads in finding ways to infer at least some amount of static type information so it can do some basic checks ahead of time.

Another middle ground is automated code generation, an older method yet much less often explored. It offers the advantages we associate with dynamic programming languages in that that there is no limit (except your skill and imagination) to the amount of automation that can be done, along with the advantage of type safety that allows both ahead-of-time checking and extended optimization.

Automated code generation, however, has a deservedly bad reputation because, historically, it was implemented very poorly. Two bad design decisions have made code generation a poor choice in the past. The first is that many code generators would generate code that the programmer then modified. That is, the “generator” would produce boilerplate code and programmers would then modify that code as they saw fit. In the early days of Visual Studio, for instance, Visual Studio would write pages and pages of boilerplate code for you, which you then modified yourself.

That is a mess to maintain. First, if the code-generating tool improves (or if you change the template used to make it), it is impossible to re-generate the code without overwriting your changes. Second, as is often the case, programmers will invariably mix their code with the generated code. This makes it harder to regenerate code in the future as the tools or your template change. It makes it hard for future programmers to know which pieces “came with the system” and which pieces were programmed by hand. The modern solution, however, is to simply separate out the generated code from the non-generated code. The generated code simply is not meant to be edited by programmers and therefore can always be regenerated as needed.

The second problem with automated program generating tools is getting build systems to work with them. It is difficult to integrate automated code generation steps with most integrated development environments. However, this is starting to change. Visual Studio, for instance, has code generation from XAML built in to the platform. The Go programming language, developed at Google, introduced a specific build step into their system for automated code generation, as well as a convention for designating files as automatically generated.

In short, automated code generation can combine the power of dynamic and type safe languages to produce a system that improves productivity in both directions. It prevents the programmer from making mistakes and creates a flexible environment that conforms to the needs of the project. The problems with this approach have been largely mitigated simply by making sure that generated code and custom code are fully separated. Modern languages such as C# and Go are starting to embrace this method of productivity, and I hope that more developers start to see the advantages of building and using automated code generation tools.

Also by Jonathan Bartlett on programming questions:

Yes, you can build your own chatbot. New tools have made it comparatively easy.

Successful generalization is a key to learning. In machine learning, the Solomonoff induction helps us decide how successful a generalization is.

and

Machine learning tip: Set boundaries for the problems. We cannot take a giant pile of unorganized data, shove it into a machine, and expect useful results.