Ruminations of a Programmer: November 2007

Thursday, November 29, 2007

Fun with unfold in Erlang

Fold is one of the most commonly used catamorphisms in functional languages. Erlang has foldl and foldr, which are used very frequently to encapsulate some of the very common patterns of computation as higher order operators instead of using recursion directly. Fold's dual is unfold (anamorphism), which unfortunately does not enjoy an equal popularity amongst functional programmers. While fold is a recursion operator that consumes a collection, it's dual unfold encapsulates a common pattern that produces streams / collections from a single object.

Here is an attempt to implement an unfold in Erlang .. adopting from this thread ..

unfold(Seed, Predicate, Transformer) when is_integer(Seed) ->
    unfold(Seed, Predicate, Transformer, 1).

unfold(Seed, Predicate, Transformer, Incrementor) ->
    unfold(Seed, Predicate, Transformer, Incrementor, []).

unfold(Seed, Predicate, Transformer, Incrementor, Acc) ->
    case Predicate(Seed) of
        true -> unfold(Incrementor(Seed),
                       Predicate,
                       Transformer,
                       Incrementor,
                       [Transformer(Seed)|Acc]);
        false -> lists:reverse(Acc)
    end.

It is a simple implementation that takes a seed, a transformer that transforms a state to an output value, a predicate that tells when to stop and an incrementor that takes us to the next state. And it generates finite collections, though a generic implementation should potentially be able to generate infinite streams as well. But it provides a base for implementing some of the common functions from the Erlang library in a pointfree notation ..

seq(Min, Max, Incr) when Incr > 0, Max >= Min ->
    unfold(Min,
           fun(X) -> X =< Max end,
           fun(X) -> X end,
           fun(X) -> X + Incr end);

seq(Min, Max, Incr) when Incr < 0, Max =< Min ->
    unfold(Min,
           fun(X) -> X >= Max end,
           fun(X) -> X end,
           fun(X) -> X + Incr end).

This one is more powerful. It packs an iteration over multiple lists into one operation and completely abstracts away the iteration from the user.

zip(Ls) ->
    unfold(Ls,
           fun(X) -> length(hd(X)) > 0 end,
           fun(Lists) -> [hd(List) || List <- Lists] end,
           fun(Lists) -> [List -- [hd(List)] || List <- Lists] end).

And you can treat them up with String lambdas as well .. of course for this you will need the string lambda library ..

zip(Ls) ->
    unfold(Ls,
           lambda("length(hd(_)) > 0"),
           lambda("[hd(List) || List <- _]"),
           lambda("[List -- [hd(List)] || List <- _]")).

seq(Min, Max, Incr) when Incr > 0, Max >= Min ->
    unfold(Min,
           lambda("_ =< Max"),
           lambda("_"),
           lambda("_ + Incr"));

seq(Min, Max, Incr) when Incr > 0, Max >= Min ->
    unfold(Min,
           lambda("_ >= Max"),
           lambda("_"),
           lambda("_ + Incr")).

Why unfold when you can recurse directly ? It provides one more pattern to programming in functional languages. Virtues of catamorphism and anamorphism have been well elucidated in various literature of functional programming. Look here for a detailed discussion on the virtues of unfold and unfold characterization of many well known algorithms.

Monday, November 26, 2007

Productivity, Team Size and the Blub Paradox

Reginald Braithwaite is one of my favorite bloggers. Each of his postings make me think, some of them leave a lasting impression. One of them is this one that talks about small teams, productivity and the power of abstraction in programming languages.

Do we necessarily have to build large teams to solve complex programming problems ? Here is what Reginald has to say ..

All of our experience in the last sixty years has suggested that productivity drops off a cliff as team size increases. So, if you want more code from a larger team, you have to invest heavily in ways of extracting value out of unproductive people in an unproductive environment.

We cannot build our project execution infrastructure for the unproductive mass of average developers. (via Neal Ford's JRuby podcast) Glen Vanderburg once noted that bad developers will move heaven and earth to do the wrong thing. And by having a restricted environment all you are doing is constraining the power of good developers, not making bad developers any better.

And this is where the expressive power of the programming language comes in. Paul Graham has talked a lot about succinctness, blub programmers and metrics for the continuum of abstractness for programming languages. Good developers love to program in languages higher up the power continuum, while a blub programmer looks out for the averagest feature-set that aligns well within his comfort zone. Hence, as Reginald says ..

If we know that bug per line of code remains amazingly constant, why do we try to scale code out in verbosity rather than up in abstraction?

Scaling out in verbosity is encouraging those language features that add to the lines of code at the cost of the levels of abstraction. Which in turn is encouraging the blub paradox.

Java is still the most dominant language of the enterprise and JVM is undoubtedly the ubiquitous platform. It is difficult for an enterprise to move away from a platform overnight. But today we have a number of alternatives in programming languages that run on the Java Virtual Machine. Many of them are much higher than Java in the power continuum of abstractness. Isn't it time that we start embracing some of them, at least incrementally ? Neal Ford makes a great point in his JRuby podcast while talking about polyglot programming. Tests and builds are two areas that do not ship to the customer. These make great candidates for bootstrapping other JVM friendly languages into the enterprise. Next time try using JRuby for writing your tests, Raven for writing the build scripts. Soon you may start feeling that you could have collapsed your multi-level strategy hierarchy in Java using higher order functions.

Wednesday, November 21, 2007

Erlang String Lambdas

Over the last couple of months I have been using Oliver Steele's Functional Javascript a lot. The string lambdas provide enough syntactic sugars to reduce a lot of accidental complexities from your codebase. Reginald Braithwaite recently gave us String#to_proc as a port of the same in Ruby. I am a big believer in adding syntactic sugars to a language so long it helps reducing boilerplates and making the domain behavior more expressive. I have been learning Erlang for some time now, sneaking off and on into Joe Armstrong's book and casually feeding into some of the Erlangy blogs around. Over the last weekend I wanted to have a go at porting Oliver Steele's string lambda into Erlang. And here is the result ..

Typically in Erlang you write ..

(fun(X) -> X * 2 end)(6)

to get 12 as the result. With string lambda, you write ..

(utils:lambda("X * 2"))(6)

to get the same result. Ok, counting the characters of the module and the function, you grin at the fact that it is really more verbose than the original one. But we have less boilerplates, with the explicit function declaration being engineered within the lambda.

A couple of more examples of using the string lambda :

(utils:lambda("X+2*Y+5*X"))(2,5) => 22
(utils:lambda("Y+2*X+5*Y"))(5,2) => 34

Less noisy ? Let's proceed ..

You can have higher order functions using the string lambdas, and with lesser noise .. The following are based on original Javascript examples in Oliver Steele's page ..

lists:map(fun(X) -> X * X end, [1,2,3]).

-> [1,4,9]

lists:filter(fun(X) -> X > 2 end, [1,2,3,4]).

-> [3,4]

lists:any(fun(X) -> length(X) < 3 end, string:tokens("are there any short words?", " ")).

-> false

and with string lambdas, you have the same result with ..

utils:map("X*X", [1,2,3]).
utils:filter("X>2", [1,2,3,4]).
utils:any("length(_) < 3", string:tokens("are there any short words?", " ")).

In the last example, _ is the parameter to a unary function and provides a very handy notation in the lambda. Here are some more examples with unary parameter ..

%% multiply by 5
(utils:lambda("_*5"))(8).

-> 40

%% length of the input list > 2 and every element of the list > 3
(utils:all_and(["length(_) > 2", "utils:all(\">3\", _)"]))([1,2,3,4]).

-> false

Explicit Parameters

As per original convention, -> separates the body of the lambda from the parameters, when stated explicitly and the parameters are matched based on their first occurence in the string lambda ..

(utils:lambda("X Y->X+2*Y"))(1,4).

-> 9

(utils:lambda("Y X->X+2*Y"))(1,4).

-> 6

Implicit Parameters

If not specified explicitly, parameters can be implicit and can be effectively deduced ..

%% left section implicit match
(utils:lambda("/2"))(6)

-> 3

%% right section implicit match
(utils:lambda("10/"))(2)

-> 5

%% both sections implicit match
(utils:lambda("/"))(12,6)

-> 2

Chaining for Curry

The operator -> can be chained to implement curried functions.

((utils:lambda("X->Y->X+2*Y"))(1))(4).

-> 9

or deferred invocation ..

Fun=(utils:lambda("X->Y->X+2*Y"))(1).

-> #Fun<erl_eval.6.72228031>

Fun(4).

-> 9

Higher Order Functions

String lambdas allow creating higher order functions with much less noise and much more impact than native boilerplates in the language.

%% compose allows composition of sequences of functions backwards
(utils:compose("+1", "*2"))(1).

-> 3

%% higher order functions
utils:map("+1", utils:map("*2", [1,2,3])).

-> [3,5,7]

lists:map(utils:compose(["+1", "*2"]), [1,2,3]).

-> [3,5,7]

And here are some catamorphisms ..

lists:reverse() is typically defined by Erlang in the usual recursive style :

reverse([] = L) ->
    L;
reverse([_] = L) ->
    L;
reverse([A, B]) ->
    [B, A];
reverse([A, B | L]) ->
    lists:reverse(L, [B, A]).

Here is reverse() using catamorphism and delayed invocation through string lambdas ..

Reverse = utils:lambda("utils:foldl(\"[E] ++ S\", [], _)").

-> #Fun<erl_eval.6.72228031>

and later ..

Reverse([1,2,3,4]).

-> [4,3,2,1]

Or the classic factorial ..

Factorial = utils:lambda("utils:foldr(\"E*S\", 1, lists:seq(1,_))").

-> #Fun<erl_eval.6.72228031>

and later ..

Factorial(5).

-> 120

Motivation to learn another programming language

I am no expert in functional programming. I do not use functional programming in my day job. But that is why I am trying to learn functional programming. Also it is not that I will be using functional programming to write production code in the near future. We all know how the enterprise machinery works and how a typical application developer has to struggle through the drudgery of boilerplates in his bread-earner job. Learning newer paradigms of programming will teach me how to think differently and how to move gradually towards writing side-effect free programs. The principal take-away from such learning is to be able to write a piece of code that encapsulates my intention completely within the specific block, without having to look around for those unintentional impacts in other areas of the codebase.

I am a newbie in Erlang and the attempt to port string lambdas to Erlang is entirely driven by my desire to learn the programming language. The source code is still a bit rusty, may not use many of the idioms and best practices of the language, and may not cover some of the corner cases. Go ahead, download the source code, and do whatever you feel like. But drop in a few comments in case you have any suggestions for improvement.

- lib_lambda.erl (the string lambda implementation)
- lib_misc.erl (miscellaneous utilities)
- lib_lambda_utilities.erl (utility wrappers)

Saturday, November 17, 2007

Learning Functional Programming

comp.lang.functional recently carried an interesting thread on learning functional programming. Of course it has the usual recommendations in favor (and against as well) of learning Common Lisp, Scheme and Haskell as a starter. However, with Microsoft productizing F#, a variant of OCamL on .NET, the ML family has also been a strong contender these days. From the thread, talking of OCamL ..

It's generally regarded the C++ of the functional world: fast, practical, somewhat ugly. A lot less bloated, of course. It also boasts the OO paradigm (hence, the name) and direct imperative constructs where you see fit (rather than Haskell's insistence on purity via monads)...

and

OCaml is, above all a practical and pragmatic language. Companies like Jane St Capital (see the article in issue 7 of "The Monad Reader") and XenSource among others are using Ocaml for real world problems.

OCaml encourages you to write pure static single assignment functional code. However, when that is not enough, you can drop back to imperative coding and uses references (ie mutable variables). It even offers OO for situation when OO is needed ..

F# has certainly been been regarded as one of the better modern functional programming languages. Jon Harrop, of OCaml fame, has high regards for it, and he has been working on "porting" his OCaml book in F# ..

As a functional programming language, F# is among the best. It provides modern features like an expressive static type system, pattern matching and OOP as well as being practical. The F# language is also being pushing into the mainstream by Microsoft, who recently announced its productization.

I think the biggest plus in favor of F# is the committment of Microsoft to push it as a mainstream functional language on the .NET platform. Integrating F# as a fully supported language in Visual Studio will definitely add to the momentum in the language being used in mainstream enterprise applications.

Is the functional programming paradigm looking stronger than ever in being pushed to the mainstream development horizon ?

Friday, November 09, 2007

Infinite Streams using Java Closures

Neal Gafter's prototype of the closures implementation in Java has given us enough playground to fool around with. Of late, I have been dabbling around with a couple of idioms in functional programming trying to implement it in Java. Many of them have already been tried using functors, anonymous inner classes and the likes. Many of them work too, but at the cost of high accidental complexity. The following is an attempt to get a clear implementation of infinite streams in Java.

Infinite streams give you the illusion that it can contain infinite number of objects. The real kludge behind infinite streams is lazy evaluation. SICP introduces the term delayed evaluation, which enables us to represent very large (even infinite) sequences as streams. Functional languages like Haskell and Miranda employs laziness as the default paradigm of evaluation, while languages like Scheme implement the same concepts as library functions (delay and force).

Dominik Gruntz implements infinite streams in Java using the functor paradigm and inner classes. The obvious problem is verbosity resulting from the accidental complexity that they lend to the implementation. In this post, I attempt the same using Neal's closures prototype. So, without further ado ..

The Stream Interface

Here's the contract for lazy evaluation ..

class StreamTest {
  interface Stream<E> {
    E car();
    Stream<E> cdr();
    E get(int index);
    <R> Stream<R> map(Unary<? super E, R> f);
    Stream<E> filter(Unary<? super E, Boolean> f);
  }
  //..
}

and a lazy implementation using Java closures ..

class StreamTest {
  interface Stream<E> {
    //.. as above
  }

  static class LazyStream<E> implements Stream<E> {
    private E car;
    private {=>Stream<E>} cdr;

    // constructor
    public LazyStream(E car, {=>Stream<E>} cdr) {
      this.car = car;
      this.cdr = cdr;
    }

    // accessors
    public E car() { return car; }
    public Stream<E> cdr() { return cdr.invoke(); }

    // access at position
    public E get(int index) {
      Stream<E> stream = this;
      while (index-- > 0) {
      stream = stream.cdr();
    }
      return stream.car();
    }

    // map over the stream
    public <R> Stream<R> map(Unary<? super E, R> f) {
      return cons(f.invoke(car), {=>cdr().map(f)});
    }

    // filter the stream
    public Stream<E> filter(Unary<? super E, Boolean> f) {
      if (car() != null) {
        if (f.invoke(car()) == true) {
          return cons(car(), {=>cdr().filter(f)});
        } else {
          return cdr().filter(f);
        }
      }
      return null;
    }

    // factory method cons
    public static <E> LazyStream<E> cons(E val, {=>Stream<E>} c) {
      return new LazyStream<E>(val, c);
    }
  }
}

A couple of points bugging ..

I had to make up the Unary class since the closure did not allow me to specify ? super E in the left hand side. Ricky has clarified with Neal that this is due to the fact that things on the left hand side of a closure automatically have ? super in their types. Hence a little noise ..

static class Unary<T,R> {
  private {T=>R} u;

  public Unary({T=>R} u) {
    this.u = u;
  }

  public R invoke(T arg) {
    return u.invoke(arg);
  }
}

and now some tests ..

class StreamTest {
  //.. all above stuff
  //.. and the tests

  // helper function generating sequence of natural numbers
  static LazyStream<Integer> integersFrom(final int start) {
    return LazyStream.cons(start, {=>integersFrom(start+1)});
  }

  // helper function for generating fibonacci sequence
  static LazyStream<Integer> fibonacci(final int a, final int b) {
    return LazyStream.cons(a, {=>fibonacci(b, a+b)});
  }

  public static void main(String[] args) {

    // natural numbers
    Stream<Integer> integers = integersFrom(0);
    Stream<Integer> s = integers;
    for(int i=0; i<20; i++) {
      System.out.print(s.car() + " ");
      s = s.cdr();
    }
    System.out.println("...");

    // a map example over the stream
    Stream<Integer> t = integers;
    Stream<Integer> u = t.map(new Unary<Integer, Integer>({Integer i=>i*i}));
    for(int i=0; i<20; i++) {
      System.out.print(u.car() + " ");
      u = u.cdr();
    }
    System.out.println("...");

    // a filter over stream
    Stream<Integer> x = integers;
    Stream<Integer> y = x.filter(new Unary<Integer, Boolean>({Integer i=>i%2==0}));
    for(int i=0; i<20; i++) {
      System.out.print(y.car() + " ");
      y = y.cdr();
    }
    System.out.println("...");
  }
}

Closures in Java will surely bring in a new paradigm of programming within the developers. The amount of excitement that the prototype has already generated is phenomenal. It'll be too bad if they do not appear in Java 7.

Update: Ricky Clarkson points out in the Comments that {? super E=>? extends R} is the same as {E=>R}. The covariance / contravariance stuff just blew off my mind when I compiled the post. Hence the Unary class is not required. Just remove the class and substitute the closure in map() and filter(). e.g.

public <R> Stream<R> map({E=>R} f) {
  return cons(f.invoke(car), {=>cdr().map(f)});
}

Monday, November 05, 2007

Towards more Powerful Abstractions in the Javaland

Way back in 2000, Scott Myers suggested improving encapsulation through increased usage of non-member, non-friend functions in C++. He had this punchline as the starting point of his column :

If you're writing a function that can be implemented as either a member or as a non-friend non-member, you should prefer to implement it as a non-member function. That decision increases class encapsulation. When you think encapsulation, you should think non-member functions.

While discussing the degrees of encapsulation, he explains that the amount of encapsulation in a class is inversely proportional to the number of functions that may break if the implementation of the class has to change. And that being the case, it becomes clear that a class with n member functions is more encapsulated than a class with n+1 member functions.

I was then an avid C++ programmer and this article hit me like a bolt. This was possibly the first instance of an exposition (that I found) on OO that makes you think away from the kingdom of nouns. Keep the number of verbs hanging off the nouns to a minimum, use the C++ namespace as the module of encapsulation, so as to keep the class interfaces complete and minimal.

Shortly afterwards, Herb Sutter, in one his Guru-of-the-Week columns, Monoliths "Unstrung", dissected std::basic_string to identify the set of functions which should have been implemented as member functions instead of the current count of 103.

Very recently, Reg Braithwaite and Buko Obele discussed the inappropriateness of applying the noun/verb metaphor to object oriented programming. It is not correct to model all verbs hanging off the nouns - Obele goes on to say ..

The question of who verbs 'belong to' is always the wrong question; instead it is always a matter of how a new concept can be meaningfully introduced into our existing system.

In fact, long back, the design of the C++ Standard library was based on the concept of making the algorithms (the verbs) the first class citizens implemented generically on the containers that they operate on.

It's not the objects carrying the functions that make the OO way of modeling things - the language has to support powerful abstractions that help programmers write extensible code.

The Expression Problem

This has been one of the classical examples to demonstrate how mainstream OO languages lack abstractions for modeling an extensible solution. The problem has been described simplistically in this paper by Zenger and Odersky :

Suppose we have a datatype which is defined by a set of cases and we have processors which operate on this datatype. There are primarily two directions along which we can extend such a system:
• The extension of the datatype with new data variants,
• The addition of new processors.

The classical OO approach to solve this problem involves designing a polymorphic hierarchy of datatypes, with each concrete subclass modeling a data variant and implementing the set of processors. With this approach, extending datatypes is easy, just add a variant as another subclass. But extending processors is invasive, violates the Open/Closed principle and forces you to dig into every existing data variant. A typical fallout of embracing the noun/verb paradigm of modeling.

The alternate not-a-strictly-OO approach adopted by programmers to solve these problems is by implementing some sort of double dispatch using the Visitor design pattern. This is an example where the solution gets overtly complex and forces you to write code that simulates the run time type dispatch mechanism of the underlying programming language. The Visitor design pattern is yet another example of a workaround for the noun/verb metaphor in mainstream OO languages. The Visitor abstractions allow a physical separation of the processors from the datatype, but the individual processors very much match one-to-one with each individual datatype. In this paper (Matching Objects with Patterns) discussing the Scala language capabilities, Burak Emir et. al. states this problem with Visitors more succinctly :

The visitor design pattern causes a relatively high notational overhead for framework construction, because a visitor class has to be defined and matchWith methods have to be provided in all data variants. The pattern matching itself is disciplined but very verbose, especially for deep patterns. Visitors in their standard setting do not maintain representation independence, because case methods correspond one-to-one to data alternatives.

How can we get around this accidental complexity and make our OO code more succinct ? The answer is simple - more powerful abstractions.

Scala Pattern Matching

Scala offers pattern matching as a solution to the expression problem. Pattern matching has been one of the key features in many functional languages like Erlang - Scala has brought this feature to the OO paradigm. Martin Odersky has been quite a strong advocate of pattern matching, a feature which many purists consider orthogonal to encapsulation of abstractions. But definitely pattern matching in Scala provides a solution away from the typical noun/verb syndrome in today's OO modeling paradigms. Have a look here for a complete example of pattern matching in Scala to solve a problem which would have been much more verbose and complex using visitors.

And Java ?

Java has so long been the kingpin of the noun kingdom and there are tons of Java code that model verbs hanging off the nouns. However, with more and more functional paradigms crying out on the sidelines for their entry into the noun kingdom, things look to be changing. I have been dabbling a bit with the prototype of Closures released by Neal Gafter, and things have already started looking interesting. Closures in Java is definitely a very very welcome feature, and the community has already started battling for their introduction in Java 7. Ricky Clarkson has implemented pattern matching in Java using Neal's prototype. Though at a very early stage, it is indeed very heartening to see more powerful forms of abstractions making their way into the Java programming language. Java, as a platform, has already been enriched with languages like Scala and JRuby contributing many powerful abstractions of functional programming.

Reg has demonstrated the Equivalence relationship using the visitor pattern in a manner conforming to the tradition of verbosity in Java. With pattern matching and closures, things will improve towards more succinctness. Here is a version based on the pattern matching engine of Ricky :

class Main
{
  public static void main(String[] args) {
    Collection coll1 = new ArrayList<Integer>() {{ add(12); add(15); add(10); }};
    Collection coll2 = new TreeSet<Integer>() {{ add(12); add(15); add(10); }};

    // equivalence of List-List and List-Set
    System.out.println(equivalent(coll1, coll2)
        .add(List.class, List.class, {List l1, List l2 => CollectionUtils.isEqualCollection(l1, l2)})
        .add(List.class, Set.class, {List l1, Set s1 => CollectionUtils.isEqualCollection(l1, s1)})
        .done());

    Map<Integer, Integer> coll3 = new HashMap<Integer, Integer>() {{
      put(1, 12);
      put(2, 15);
      put(3, 10);
    }}

    // equivalence of List-List, List-Set, List-Map
    System.out.println(equivalent(coll2, coll3)
        .add(List.class, List.class, {List l1, List l2 => CollectionUtils.isEqualCollection(l1, l2)})
        .add(List.class, Set.class, {List l1, Set s1 => CollectionUtils.isEqualCollection(l1, s1)})
        .add(List.class, Map.class, {List l1, Map m1 => CollectionUtils.isEqualCollection(l1, m1.values())})
        .done());
  }

  public static <T,R> Matcher<T,R> equivalent(T t1, T t2) {
    return new Matcher<T,R>(t1, t2);
  }

  public static <T,R> Matcher<T,R> match(T t1, T t2) {
    return new Matcher<T,R>(t1, t2);
  }

  public static class Matcher<T,R> {
    public final T t1;
    public final T t2;
    public R r;

    public Matcher(T t1, T t2) {
      this.t1 = t1;
      this.t2 = t2;
    }

    public <U1 extends T, U2 extends T> Matcher<T,R> add(Class<U1> aCase, Class<U2> bCase, {U1,U2=>R} f) {
      if (aCase.isInstance(t1) && bCase.isInstance(t2))
        r=f.invoke(aCase.cast(t1), bCase.cast(t2));

      return this;
    }

    public R done() {
      return r;
    }
  }
}

Definitely a more succinct way to model equivalence than what we are used to in the Javaland. And this is only a start. There are quite a few rough edges to smoothen out, particularly with respect to handling generics and type erasure issues. But, I am sure we will see more and more power of abstractions coming into Java with people like Neal Gafter and Doug Lea pitching in. A tonne of thanks to these thoughtleaders for still keeping the Java language an interesting platform to play around with.