Monday, January 17, 2011

Scala Considered Harmful For Large Projects?

Original Tweet

Last week, I was having an extended email conversation with Geir Magnusson Jr. (yep, that Geir, of Gilt Groupe/MongoDB/Apache Software Foundation fame), where we were talking about whether it was a good, bad, or indifferent idea to allow people to start to include Scala into large-scale programming projects that were otherwise based in Java. Geir and I ended up both agreeing that this was actually a bad thing, and should be avoided for the time being.

I tweeted the above missive, and all of a sudden I had more than 10 people contact me saying "please tell me more!" Let no one think I ignore my faithful readers.

What do I mean by a large-scale programming project? I mean one with the following characteristics:

  • It has a sufficient size that not every developer is potentially even familiar with every module.
  • It has distribution such that in a production use case, it is not a single monolithic process/binary.
  • It has enough people working on it that there is some type of specialization in terms of the teams and/or programmers.
  • It has flux in the developers (new people joining, people being replaced).
  • The same code base has been in constant development for a period of years, or is anticipated to be.

I've written about this in the past, where I referred to my desire for a Journeyman Programming Language. For what it's worth, I'm 100% behind the Backwards Incompatible Java movement spearheaded by one of OpenGamma's developers, Stephen Colebourne. And in that frame, what we're talking about in a large-scale programming project is precisely the context in which a journeyman programming language becomes useful.

So why did Geir and I concur that someone who has a large-scale Java programming project shouldn't start adding Scala into the mix?

Two main reasons:

  • The inherent problems in a split-language project;
  • The sheer flexbility (by design) of the language;

Do You Really Want Two Programming Languages?

Let's say you've got a single programming language in your large-scale project. In that case, you hire people who know that language; all your tooling is specific to that language; you try to make the best use possible of that language.

Any sufficiently complex large-scale programming project will likely have multiple programming languages inherently (usually because you need to use a particular programming language for a particular function or to integrate with someone else's code). But should you try to intermingle them by design?

Once you've done that, you've started to add unnecessary complexity to the project:

  • Do you have to start hiring dual-language programmers, or spend time training them in the language they don't already know?
  • How well does the tool chain integrate? Are there key gaps or differences in operation?
  • Are you going to inadvertently split the team into people who are comfortable with New Language versus people who aren't? Is that going to reduce your staffing flexibility to assign tasks to developers?
  • Is there going to be an impedance mismatch in terms of people or technology in crossing the language boundary? Is that going to cost you?

These are questions that have been faced by programming teams for years, and usually people come down on the side of not allowing programming language proliferation, unless it's part of an effort to upgrade/migrate the whole code base over time from an obsolete language/model to a more modern one. Would you really get that much benefit from Scala to include it at this stage? I doubt it.

Modern Java has a lot of the stuff that's in Scala. It's not as well integrated, much of it is bolted on as an afterthought, and quite a bit of it is based on convention, but you can get a lot of good stuff out of Java written in a very modern style. On balance, I don't think you get that much out of Scala to warrant its introduction into an existing large programming project.

Language Flexibility Bad For Large-Scale Projects

Let's assume that you're working in a programming language like Scala or C++ that lends itself by design to what I would consider to be abuse of the programming facilities (whether it's the ability to write BASIC code inline in the language, the ability to introduce APL-like operator character insanity, or the gratuitous abuse of the compiler that is turing-complete lambda calculus in the templating system). These are, I 100% concur, cool features. It's very very cool to see someone doing something like the BASIC DSL.

The problem is that if you have a language designed to make it easy to do this, you encourage people to do it. And that's when problems start.

Large-scale programming projects need to avoid this type of thing at all costs. In my opinion, there are several key requirements for the codebase for a large-scale programming project:

  • The code must be intuitive and fast to comprehend for a developer familiar with the rest of the project, but not the code in any particular module. In other words, anyone on the team can edit/fix/enhance any other part of code (where they understand the business concepts and/or underlying math).
  • The code must be intuitive and fast to comprehend at any point in the future. In other words, you can't have serious issues looking at old code (that's still in the code base) or old edits (where you're trying to figure out why a change was made by looking at history).

Does an inherently flexible language with kewl DSLs and custom characters and template abuse satisfy these? No. It doesn't satisfy the temporal understanding clause (because a DSL that doesn't get consistently used over time loses understanding), nor does it satisfy the pan-developer clause (because not everybody on the team, even if they know Scala as well as Java, may know the DSL or be able to effectively type in high-order Unicode glyphs).

Ultimately, a programming language that's useful for large-scale programming projects needs to have clear, unambiguous grammar and syntax so that any developer familiar with the project and the language can instantly figure out what's going on. Any features (operator overloading, terse/different method invocation syntax, DSLs) that add time in trying to figure out what a block of code does slow down the project. Sadly, Scala is chock-full of them, and it appears to be considered good Scala style to use as many of them as possible.

Can Scala Work?

Although you may think I'm all about hatin' on Scala, the answer is actually Yes.

Neither Geir nor I consider Scala inherently harmful to a large-scale programming project. It can be a very good addition, if your tool chain supports it and you're willing to train staff so that everybody can handle it.

But you have to have even more rigorous coding standards and review process to make sure that the features of the language that excite the "insanely cool programming techniques" crowd don't get used. You have to do what every C++ programming group has done for years: pick the features of the language and library ecosystem you're going to use, use them in a consistent way, and never ever deviate.

Note that I actually think that if you're kicking off a new project, you owe it to yourself to take a serious look at Scala as a programming language base for that project. But if you already have a large-scale Java-based programming project, I personally think that I would avoid it as much as possible.

By the way, lest you think that a new component or module comprise a new project in my thoughts, you're sadly wrong. A "new project" is one where, in my opinion, the entire technology team is separate from the previous one. In other words, a new codebase altogether. Think: new startup; new Line-of-Business funded application; heads of technology reporting into different parts of an organization.

blog comments powered by Disqus