Monday, July 07, 2008

Erlang's concurrency model on the JVM - Can we have (at least a subset of) OTP in Scala ?

Jonas Boner is working towards an OTP implementation in Scala. Definitely sounds to be a great exercise and may well prove to be a viable and scalable implementation on the JVM. We are seeing more and more implementations of highly scalable systems using Erlang/OTP - Facebook, ejabberd, CouchDB, Mochi* to name a few. And not without a reason. Erlang/OTP provides an awesome stack that fits the scalability-reliability-distribution space like a glove.

Scala offers shared-nothing asynchronous message passing paradigm in its programming model, very much similar to Erlang. As a language, however, Scala is quite different - statically typed, functional-OO hybrid with type inferencing and a lot more core features than Erlang. The beauty of Erlang is in its small simplicity, often looked at as being syntactically weird. But the amount of research that has gone into the concurrency and distribution model which Erlang implements is truly truly phenomenal. Here are some thoughts that come to my mind when thinking about Scala's contribution and possibilities in this space ..

Scala actors are very similar to Erlang processes as abstractions of composability. Erlang/OTP offers gen_server, gen_fsm, gen_event etc. along with robust fault tolerant hierarchical supervision of processes across nodes and clusters as combination of generic servers and pluggable callbacks. More than the language, OTP offers the platform on which Erlang processes can play with gay abandon. And the callbacks that the client need to implement can be absolutely oblivious of concurrency, process spawning, failover and clustering issues - they are written as purely sequential functions that can be easily hot swapped in and out of live installations. Can we have the same reliability of implementation in Scala ? I don't know, but even if we can have a meaningful proper subset, then it will be enough to safeguard most of the corporate investments that have been drained on to the Java Virtual Machine.

Scala needs a port of mnesia, Erlang's distributed, fault tolerant database that offers fast lookup, dynamic reconfiguration capabilities, Erlang data types all the way down (implying zero impedance mismatch) and seamless distribution and partitioning semantics. qlc, the data query language based on Erlang's list comprehension syntax is the DSL that makes Erlang almost a database programming language. For Scala, it should not be very difficult to come up with something similar to qlc, built upon its for-comprehensions. mnesia is lightweight, can be easily replicated and partitioned. And with today's trend of traditional relational databases starting to get a beating from map-reduce jobs, tuple models, and tablestores that can't even do joins, a lightweight, loosely coupled, easily replicated engine like mnesia can be a very good starting point towards persistence as a service paradigm. Memory replicated mnesia data store can be backed up with transparent persistence services offered by today's grid platforms.

Erlang offers utterly immutable variables, and Erlang's reliability as a platform is almost a mathematical corollary of this theorem. In Scala you can make things immutable, but the VM does not enforce it. With the JVM, leave immutability to the creativity of your programmers. Not all pure stuff gets blessed by the masses, we have seen this with Smalltalk and C++. And here, we can hope for the best.

People often boast of the abundance of Java libraries as the shining part of Scala's ecosystem. With respect to implementation of the actor model and the OTP paradigm, this is where Scala comes down hard. All Java libraries are baked with imperative mutable structures that will land you in the blues of synchronization and locks, that you have been trying to get away from, with the shared nothing process model. And for the OTP implementation, almost all of these Java libraries will be useless.

It is quite easy to get distribution across nodes/clusters with Erlang/OTP applications. They rely upon mnesia's serialization of tuples. Scala can be more efficient in this space using Terracotta's selective differential serialization techniques. Scala actors can be clustered using Terracotta and this can be one area where Erlang capabilities may pale out to Scala and JVM power.

In my last post, I was wondering about distribution concerns that need to be considered while designing APIs in Erlang. But if you use OTP, many of the concerns get addressed by the platform. As a client you are left with implementing callbacks that can be plugged in and out. Scala's actor model has many of the promises that Erlang offers. And as I mentioned above, if it can implement at least part of what OTP does today, Scala can play a *good enough* coup over Erlang that C++ played over Smalltalk.

3 comments:

Anonymous said...

Check out Kilim - http://www.malhar.net/sriram/kilim/. Might be what you're looking for.

Anonymous said...

scala reaching for OTP is very interesting and exciting proposition for jvm developers.

admittedly, i don't know OTP very well, but isn't there a set of orthogonal foundation features that need to be built first, before scala could ever reach parity with OTP? here's my intial stab:

(1) the ability to perform static analysis on an actor to ensure the actor's message is immutable and that all actor referenced objects are not shared.

(2) a framework for: monitoring a set of jvm nodes across a network. allowing the start and stop of jvms, as well as adding and removing new nodes dynamically.

(3) a framework for managing osgi bundles across a set of networked jvms.

(4) the ability of a super set of scala actors to support osgi.

(5) a mensia-like distributed store as you describe.

(6) a mensia-like dynamic table containing actor instance addresses, and osgi bundle locations.

(7) an optimized protocol for serialized actor messages. (i'm starting to wondering if we haven't shot ourselves in the foot with case classes being used for messages, since they can include method definitions. which means actors processing those messages must deal with access to class files, and versioning issues should those case classes get modified and recompiled.)

(8) perhaps a build process that takes into account the deployed model, and can inform the developer what in the deployed model will be broken during compilation, and from that build a deployment plan should the developer opt to make the change to the system.

this doesn't even touch upon the fact that i believe beam vm can impose memory and cpu tick restrictions at the process level - that's something scala's actors are not likely to ever duplicate.

CARFIELD said...

Will java thread perform good compare with erlang light weight process? May be we need to re-design java thread to make it run good under OTP?