Apr 04, 2011
The team behind guardian.co.uk which, according to its editor, has the second highest readership of any on-line news site after the New York Times, is gradually switching from Java to Scala, starting with the Content API, which provides a mechanism for selecting and collecting Guardian content.
Yes, SQL! Uri Cohen Reviews Distributed Data Stores, and using SQL in a Distributed World
Leaner Programmer Anarchy
eBook: SOA for Dummies
Foster Intercompany Collaboration – Remove Communication Barriers Across Corporate Boundaries
Survey Finds More than Half of Companies Underutilizing Cloud Computing Resources
The guardian.co.uk website comprises about 100,000 lines of code. It uses a fairly typical open-source Java stack of Spring, Apache Velocity and Hibernate with Oracle providing the database. Like the website, the Content API was initially being developed in Java, but the team decided to switch to another JVM-based language, Scala, in its place. Web Platform Development Team Lead Graham Tackley told us
We’ve been a primarily Java development shop for a number of years now, and this has largely served us well. However, as a news website we want to be able to respond to events very quickly. The core Java platform that delivers www.guardian.co.uk has a full release every two weeks. Compared with many enterprise Java applications, this is excellent. Compared with other websites, it’s very poor.
So we’ve been looking for a while at tools, approaches and languages that enable us to deliver functionality faster. This includes using lighter weight Java frameworks like Google Guice, radically different approaches to Java development like the Play framework, and using other platforms such as Python with Django. As part of this exercise we’d been playing with Scala for a while, but unlike the others we hadn’t yet used it for any production code.
We were very keen that the first non-beta release of the Content API (API, Open Platform) should be the first iteration of an ongoing evolving API, which could quickly evolve as we discovered all the interesting use cases that we hadn’t initially thought of. To do this safely without breaking API clients, we needed a comprehensive set of integration tests. After some experimentation of writing these in Java, we decided instead to write just the integration tests in Scala, for three main reasons:
- The flexibility of the testing DSL provided by ScalaTest.
- We wanted to be excited about writing the integration tests, rather than them being a chore.
- Using Scala just for the tests meant we got to use it in anger without impacting production code directly.
After about four weeks of writing just the tests in Scala, we got fed up of having to write the main code in Java, and decided to convert the whole lot to Scala.
InfoQ: In general terms, how did you go about the migration? Did you re-write all the Java code in Scala for instance, or did you combine the two for a while?
The beta version of the Content API was based on a proprietary search engine. The current API uses the excellent Apache Solr (a talk on guardian.co.uk’s use of Solr can be found here), and is also quite different in style to the beta one – the beta did a great job of showing us what we didn’t want the API to look like. Therefore, before Scala came into the picture, we’d decided to re-implement the API rather than reuse the beta codebase.
We’d spent around six weeks with three people implementing in Java before we introduced Scala, so there wasn’t a massive codebase to migrate. However, we weren’t prepared to stop the project for a couple of weeks while we converted to Scala, so we migrated the existing integration tests gradually. As we’d used Maven as a build tool, introducing Scala was a matter of following the instructions to use the maven-scala-plugin to build mixed Java/Scala projects. This allows Java and Scala code to co-exist in the same project, and bi-directionally depend on each other. So we could convert on a class-by-class basis from Java to Scala, which worked far better than we ever imagined: it really did just work.
We took the same approach when converting the main code: over a number of weeks, as we touched a bit of code, we converted it. We then had a couple of days mop up at the end.
InfoQ: What are the libraries/frameworks that you have used for development?
Since we were using a language new to us all, we decided to limit the amount of new stuff that we needed to learn. We chose to stick with plain servlets wired with Google Guice, which is how we build our Java apps now. We use SolrJ, the Java Solr library, to talk to Apache Solr, Joda-Time for date time manipulation and Mockito for unit test mocking (this worked fine with Scala code too).
Sometimes we consciously chose to stick with what we knew to ensure timely delivery: the XML formatted endpoints are generated not using Scala’s excellent XML support, but using javax.xml.stream.XMLStreamWriter just as we would in Java code. We’d already written this before moving to Scala; it worked, it was readable, so we left it. However, we did switch to use the excellent JSON library from Lift – lift-json – to generate the JSON formatted endpoints as the code was far clearer than with the Java JSON library we were using.
InfoQ: What IDEs do you use for development? What is Scala IDE support like?
We use Jetbrains IntelliJ IDEA 10, some of us use the community edition and some use the ultimate edition. The Scala plugin is pretty good but not perfect. Code completion, find usages, and similar navigation nearly always works just fine. It’s not as good as Java at red highlighting code that isn’t valid, and we had some problems with it finding ScalaTest test methods, but other than that we were in our familiar environment working as we always had, just in a much more powerful language.
InfoQ: I’m assuming the majority of the developers on the project were Java programmers? How easy did the developers on the project find learning Scala?
Yes, all of us were quite experienced Java programmers. The initial team of four had huge fun learning Scala: often one of us would come in raving about this new Scala feature we’d discovered and sharing it with the rest of the team. We had a buzz that had long been missing in our Java development. Because we were all learning together, this worked really well. In the first couple of weeks, though, there were occasions when we’d be battling to implement something in a good Scala way and couldn’t figure it out. Knowing you could just churn out the Java code made this particularly frustrating. There were a few days where we went home in frustration saying, "We’re going back to Java tomorrow". Each time, a fresh look in the morning was all it needed to move on.
Since then, we’ve had around ten other Java devs move to pick up Scala. As always, people learn at different speeds and in different ways, but all have come through that and nearly all now get frustrated when they have to write Java code.
One of the things we compare learning Scala against is moving to a different platform like Python/Django or Ruby on Rails. With Scala, at least 75% of what you’re working with is the same as in Java. You can use the same libraries and IDE, the way you package jars and wars is the same, your runtime environment and runtime characteristics are the same. A good Java developer can learn to write Java-style code in Scala in a day, then they learn the power of closures and implicit conversions and very soon they’re more productive than they were in Java.
InfoQ: One of the common criticisms of Scala as a language boils down to it being too complex. A lot of the time I think this is really about readability – the idea being that it is easier to pick up someone else’s code if it is written in a more rigid language like Java. Do you think the criticism is fair? How do you counter it?
I agree, readability is by far the most important characteristic of a codebase. I don’t care whether code is imperative or functional, or is idiomatic Scala or Java-without-semicolons, I only care whether it’s readable. When we were learning new Scala features, we chose whether to use them based on whether the intent of the resulting code was more obvious. In one example, we tried using the Scala Either class to eliminate a few If statements: the team collectively concluded that the If statements were more readable, so we dropped the use of Either in that case.
It’s true that due to the rigidity of Java individual lines of code are always easily understood. But that’s rarely the problem in understanding any non-trivial codebase: I don’t want to understand the detail, I want to understand the intent. Good class design and OO techniques help address this in Java, but I still often find when reading Java code that I cannot see the wood for the trees. In Scala I have the power to express the intent in a way I rarely can in Java.
For example, the Content API needs to decide whether to return results in XML, JSON or redirect to the HTML explorer. We support a format=query string, adding a .xml or .json extension, and specification in an http Accept header. Here’s the code that does that, which I think is a good example of how Scala’s power aids expression of intent (it’s just chaining calls to Scala’s Option class):
great news, looking forward to read more about sites switching from Java to Scale - we still live with Grails.