Ruminations of a Programmer: October 2007

Wednesday, October 31, 2007

A Computer Literacy interview with Donald Knuth

A couple of days back, googling some time away while waiting for a flight at the airport, I happened to glean across an interview with Donald Knuth. It was taken around 1993 by Computer Literacy after the publication of his books on CWEB and The Stanford Graphbase. Besides talking about the obvious topics that you would expect out of a Knuth session, he also ruminates on his many other passions like music, piano and writing. We all know Knuth's obsession with TeX and METAFONT. But did you know that the Father of Analysis of Algorithms is an expert in piano music ? He says :

I would like to have friends come to the house and play four-hands piano music. If I could do it every week, I would. I hope to live long enough so that after I've finished my life's work on The Art of Computer Programming, I might compose some music. Just a dream... it might be lousy music, of course.

Go, read the whole piece .. it's a wonderful read about some of the thoughts of the legend.

Friday, October 19, 2007

Clojure is here

I came across this posting Lisp on the JVM in reddit and thought .. what the heck ? What's so great about it when we have already ABCL, KAWA, SISC for the JVM ? In fact the title in reddit is a bit misleading - Clojure is very much like Lisp. It is targetted for the JVM, but more than anything else, the design embodies lots of thoughts towards immutability, functional data structures, concurrency, STM etc. Here is a comment from the author himself on reddit :

Clojure has some tangible, non-superficial differences from Common Lisp and Scheme. They yield something that is different, and might or might not be more suitable depending on your programming style and application domain.

Most of the core data structures are immutable. This is part of an overall design philosophy to make Clojure a good language for concurrent/multi-core programming.

Most of the data structures are extensible abstractions. This is different from Common Lisp where you can't extend the sequence functions to your own data structures, for instance. Even invocability is an abstraction - allowing maps to be functions of their keys and vice-versa.

Clojure extends code-as-data to maps and vectors in a deep way. They have literal reader syntax, the compiler understands them, backquote works with them, they support metadata etc. Because they are efficiently immutable and persistent, they support very Lisp-y recursive usage, shared structure etc, in ways Common Lisp's hash tables and vectors cannot.

Clojure embraces its host platform in ways that the standard Lisps ported to the JVM can't. For instance, Common Lisp's strings could never be Java Strings since the former are mutable and the latter are not. Clojure strings are Java Strings. The Clojure sequence library functions work over Clojure and Java data structures transparently. Etc.

Clojure has metadata as a core concept, not something one could retrofit onto the built-in Common Lisp types.

Clojure is designed for concurrency. Vars (similar to CL special variables) have explicit threading semantics. Clojure has a software transactional memory system. Etc.

In short, Clojure is (non-gratuitously) different. If you don't want different, you don't want Clojure. If you like Lisp and need to write multi-threaded programs for the JVM, you might find something interesting in Clojure.

I had blogged about SISC sometime back and discussed how we could use Scheme as a more consumer friendly XML in your Java application. I think Clojure is going to be the most interesting dynamic language on the JVM very soon. There has never been a better time to learn Lisp !

Monday, October 15, 2007

Domain Modeling with JPA - The Gotchas - Part 4 - JPA is a nice way to abstract Repository implementation

When we model a domain, we love to work at a higher level of abstraction through Intention Revealing Interfaces, where the focus is always on the domain artifacts. Whenever at the domain layer we start coding in terms of SQLs and pulling resultsets out of the database through a JDBC connection, we lose the domain focus. When we start dealing with the persistent hierarchy directly instead of the domain hierarchy, we make the domain model more vulnerable, exposing it to the lower layers of abstraction. By decoupling the domain model from the underlying persistent relational model, JPA provides us an ideal framework for building higher levels of abstraction towards data access. The domain model can now access data in terms of collections of domain entities and not in terms of the table structures where these entities are deconstructed. And the artifact that provides us a unified view of the underlying collection of entities is the Repository.

Why Repository ?

In the days of EJB 2.0, we had the DAO pattern. Data Access Objects also provided with an abstraction of the underlying data model by defining queries and updates for every table of the model. However, the difference of DAOs with repositories are more semantic than technical. Repositories provide a higher level of abstraction and is a more natural habitant of the rich domain model. Repositories offer more controlled access to the domain model only through Aggregate roots implementing a set of APIs which follow the Ubiquitous Language of the domain. Programming at the JDBC level with the DAO pattern, we used to think in terms of individual tables and operations on them. However, in reality, the domain model is much more than that - it contains complex associations between entities and strong business rules that govern the integrity of the associations. Instead of having the domain model directly deal with individual DAOs, we had always felt the need for a higher level of abstraction that would shield the domain layer from the granularities of joins and projections. ORM frameworks like Hibernate gave us this ability and specifications like JPA standardized them. Build your repository to get this higher level of abstraction on top of DAOs.

You build a repository at the level of the Aggregate Root, and provide access to all entities underneath the root through the unified interface. Here is an example ..

@Entity
class Order {
  private String orderNo;
  private Date orderDate;
  private Collection<LineItem> lineItems;
  //..

  //.. methods
}

@Entity
class LineItem {
  private Product product;
  private long quantity;
  //..

  //.. methods
}

The above snippet shows the example of an Aggregate, with Order as the root. For an Order Management System, all that the user needs is to manipulate Orders through intention revealing interfaces. He should not be given access to manipulate individual line items of an Order. This may lead to inconsistency of the domain model if the user does an operation on a LineItem which invalidates the invariant of the Order aggregate. While the Order entity encapsulates all domain logic related to manipulation of an Order aggregate, the OrderRepository is responsible for giving the user a single point of interface with the underlying persistent store.

interface OrderRepository {
  //.. queries
  List<Order> getAll();
  List<Order> getByOrderNo(String orderNo);
  List<Order> getByOrderDate(Date orderDate);
  List<Order> getByProduct(Product product);

  //..
  void write(final Order order);
  //..
}

Now the domain services can use the service of this repository to access orders from the underlying database. This is what Eric Evans calls reconsitution as opposed to construction (which is typically the responsibility of the factory).

JPA to implement Repository

The nice thing to be able to program to a specification is the abstraction that you can enforce on your model. Repositories can be implemented using JPA and can nicely be abstracted away from the actual domain services. A Repository acts like a collection and provides user the illusion of using memory based data structures without exposing the internals of the interactions with the persistent store. Let us see a sample implementation of a method of the above repository ..

class OrderRepositoryImpl implements OrderRepository {
  //..
  //..

  public List<Order> getByProduct(Product product) {
    String query = "select o from Order o, IN (o.lineItems) li where li.product.id = ?1";
    Query qry = em.createQuery(query);
    qry.setParameter(1, product.getId());

    List<Order> res = qry.getResultList();
    return res;
  }
  //..
  //..
}

The good part is that we have used JPA to implement the Repository, but the actual domain services will not contain a single line of JPA code in them. All of the JPA dependencies can be localized within the Repository implementations. Have a look at the following OrderManagementService api ..

class OrderManagementService {
  //..
  // to be dependency injected
  private OrderRepository orderRepository;

  // apply a discount to all orders for a product
  public List<Order> markDown(final Product product, float discount) {
    List<Order> orders = orderRepository.getByProduct(product);
    for(Order order : orders) {
      order.setPrice(order.getPrice() * discount);
    }
    return orders;
  }
  //..
  //..
}

Note that the repository is injected through DI containers like Spring or Guice, so that the domain service remains completely independent of the implementation of the repository.

But OrderRepository is also a domain artifact !

Right .. and with proper encapsulation we can abstract away the JPA dependencies from OrderRepositoryImpl as well. I had blogged on this before on how to implement a generic repository service and make all domain repositories independent of the implementation.

Thursday, October 11, 2007

Defensive Programming - What is that ?

Another rant on how to think in the programming language that you are using. In most languages we use, we strive to handle exceptions as close to the site of occurrence. Be it runtime, be it checked, we take great care to ensure that the system does not crash. This is a form of defensive programming. Erlang does not espouse to this idea - the philosophy of Erlang is "to let it crash". Of course it is not as trivial as that. You have a recovery plan, but the recovery semantics is totally decoupled (yes, I mean it .. physically) from the crash itself.

I found this posting from the Erlang mailing list, where Joe Armstrong, the inventor of Erlang, explains the philosophy. Some of the highlights of his premises ..

In C etc. you have to write *something* if you detect an error - in Erlang it's easy - don't even bother to write code that checks for errors - "just let it crash".

Of course he explains how to handle crash in Erlang. It is very much related to the basic idiom of concurrency and fault tolerance that forms the backbone of Erlang's process structure. In Erlang, you can link processes, so that the linked process can keep an eye on the health of the other process. Once linked, the processes will implicitly monitor each other and if one of them crashes, the other process will be signalled. To handle the crash, Erlang suggests to use linked processes to correct the error. These linked processes need not run on the same processor as the original process, and this is where the Erlang philosophy of make-everything-distributable comes in. Joe mentions in this thread ..

Why was error handling designed like this?

Easy - to make fault-tolerant systems you need TWO processors. You can never ever make a fault tolerant system using just one processor - because if that processor crashes you are scomblonked.

One physical processor does the job - another separated physical processor watches the first processor fixes errors if the first processor crashes - this is the simplest possible was of making a fault-tolerant system.

This is also an example of separation of concerns where the handling of a crash is separately handled through a distribution mechanism. You do not code for checking of errors - let it crash and then you have a built-in mechanism of recovery within the process structure. To a layman, it feels a bit unsettling to deploy your production system in a language that follows the "let-it-crash" philosophy, but with the amazing track record of Erlang in designing fault tolerant distributed systems, it speaks volumes of the robustness and reliability of the underlying engine.

Wednesday, October 10, 2007

I didn't know Javascript can do this !

The best part of learning a new programming language is the bag of surprises that await you at every line of your program. And it's almost always a mixed bag - things that you were confident of, often do not work the way all the blogs have claimed to be. Your favorite idiom that you have so long used in every design starts falling apart. And you start thinking in a new way, in terms of the language which you have started learning. Sometimes you feel bad when your favorite design pattern finds no place in the laws of the new land, but soon you realize that there is a more idiomatic way of doing the same thing.

I have been a passionate Java programmer for the last eight years and started the journey with the same sinking feeling coming from the land of C and C++. I missed the sleak idioms of memory management using smart pointers, the black magic of template metaprogramming and a host of other techniques and patterns as evangelized by the language called C++. Now when I learn new idioms from languages like Javascript and Ruby, I always try to think how it can be mapped to an equivalent construct in Java. Not that there is always a mapping between the dynamically typed world of Ruby or Javascript and the land of static typing that Java implements. Still it's always an enlightening exercise trying to emulate the best practices of one language in the other.

This post is about one such recent experience with programming in Javascript. I was trying to get some stuff from the front end - I had to write an abstraction which will do some computation based on those values. Some parts of the computation had a fixed algorithm, while some parts varied depending upon the environment and context.

Strategy ! I thought .. this was my first reaction to the solution. Since I am not using a statically typed language as Java, I need not implement the delegation-backed-by-the-polymorphic-hierarchy-of-strategy-classes idiom. And I knew that Javascript supports closures as a higher level of abstraction, which I can use to model the pluggability that the variable part of the abstraction demands. Depending upon the context, a different closure will be passed, and I could make the variable component a closure and pass it along into the main computation routine. But the problem comes up when you have multiple such variabilities to model in the same abstraction. For every such variable point, you need to pass a closure as an argument.

// closures as arguments
function WageCalculation(taxCalculator, allowanceCalculator, ...) {
    //..
}

Works, but may not be the best way to model in Javascript. After some more digging, I came across the details of the Execution Context in Javascript. All Javascript code gets executed in an execution context and all identifiers are resolved based on the scope chain and an activation object. And Javascript offers a keyword this, which gets assigned as part of the process for setting up of execution context. The value assigned to this refers to an object, whose properties become visible to accessors prefixed with the this keyword in the main function. Hence all variabilities of the abstraction can be made part of this context and made available to the main computation routine through the this keyword. And this is exactly what Function.call() and Function.apply() do for us!

Simplifying my example to some extent for brevity, let us say that we are trying to abstract a routine for computing the wages of employees. The wage calculation contains a few components, the basic wage, the tax and allowance. Of these, the percentage of tax varies depending upon the category of the employee, while the other components follow a fixed algorithm. For simplicity, I assume hard coded values for each of them, but applying the variation where necessary.

The basic abstraction looks like the following :

WageCalc = function() {

    // privileged method
    this.taxCalc = function() {
        // 20% of basic wage
        return 0.2;
    };

    // private static
    allowanceCalc = function() {
        // 50% of basic wage
        return 0.5;
    };

    this.wage = function(basic) {
        // and the calculation
        return basic - this.taxCalc() * basic + allowanceCalc() * basic;
    }
};

and I invoke the calculation method as :

var w = new WageCalc();
print(w.wage(1000));

for a basic wage of $1000. The tax calculation component (taxCalc()) is a variable part and has been tagged with the this keyword. This is the candidate that I need to implement as a strategy and plug in different implementations depending on the context. If I want to have another implementation where the tax percentage is 30, instead of 20, I just do the following ..

MyWageCalc = function() {

    this.taxCalc = function() {
        return 0.3;
    };
};

and invoke the wage calculation function, passing in the new function object as the context. This is what the apply method of Javascript Function does :

var w = new WageCalc();
print(w.wage.apply(new MyWageCalc(), [5000]));

Here new MyWageCalc() sets up the new context on which the this.taxCalc() of wage() method operates. Hence the new tax calculation functions with 30% of the basic and not the 20% as defined in WageCalc() function.

I have only started to learn Javascript, hence I am not sure if this is the most idiomatic way of solving the problem at hand. But to me it appears to be exciting enough to share in an entire blog post.

Monday, October 08, 2007

Domain Modeling with JPA - The Gotchas - Part 3 - A Tip on Abstracting Relationships

JPA is all about POJOs and all relationships are managed as associations between POJOs. All JPA implementations are based on best practices that complement an ideal POJO based domain model. In the first part of this series, I had talked about immutable entities, which prevent clients from inadvertent mutation of the domain model. Mutation mainly affects associations between entities thereby making the aggregate inconsistent. In this post, I will discuss some of the gotchas and best practices of abstracting associations between entities from client code making your domain model much more robust.

Maintaining relationships between entities is done by the underlying JPA implementation. But it the responsibility of the respective entities to set them up based on the cardinalities specified as part of the mapping. This setting up of relationships has to be explicit through appropriate message passing between the respective POJOs. Let us consider the following relationship :

@Entity
public class Employee implements Serializable {
  //..
  //.. properties

  @ManyToMany
  @JoinTable(
    name="ref_employee_skill",
    joinColumns=@JoinColumn(name="employee_pk", referencedColumnName="employee_pk"),
    inverseJoinColumns=@JoinColumn(name="skill_pk", referencedColumnName="skill_pk")
  )
  private Set<Skill> skills;
  //..
  //.. other properties
  //.. methods
}

There is a many-to-many relationship between Employee and Skill entities, which is set up appropriately using proper annotations. Here are the respective accessors and mutators that help us manage this relationship on the Employee side :

@Entity
public class Employee implements Serializable {
  //..
  //.. properties

  public Set<Skill> getSkills() {
    return skills;
  }

  public void setSkills(Set<Skill> skills) {
    this.skills = skills;
  }
  //..
}

Similarly on the Skill entity we will have the corresponding annotations and the respective accessors and mutators for the employees collection ..

@Entity
public class Skill implements Serializable {
  //..
  //.. properties

  @ManyToMany(
    mappedBy="skills"
  )
  private Set<Employee> employees;

  public Set<Employee> getEmployees() {
    return employees;
  }

  public void setEmployees(Set<Employee> employees) {
    this.employees = employees;
  }
  //..
}

Can you see the problem in the above model ?

The problem lies in the fact that the model is vulnerable to inadvertent mutation. Public setters are evil and exposes the model to be changed inconsistently by the client. How ? Let us look at the following snippet of the client code trying to set up the domain relationships between an Employee and a collection of Skills ..

Employee emp = .. ; // managed
Set<Skill> skills = .. ; //managed
emp.setSkills(skills);

The public setter sets the skillset of the employee to the set skills. But what about the back reference ? Every skill should also point to the Employee emp. And this needs to be done explicitly by the client code.

for(Skill skill : skills) {
  skill.getEmployees().add(emp);
}

This completes the relationship management code on part of the client. But is this the best level of abstraction that we can offer from the domain model ?

Try this !

If the setter can make your model inconsistent, do not make it public. Hibernate does not mandate public setters for its own working. Replace public setters with domain friendly APIs, which make more sense to your client. How about addSkill(..) ?

@Entity
public class Employee implements Serializable {
  //..
  //.. properties

  public Employee addSkill(final Skill skill) {
    skills.add(skill);
    skill.addEmployee(this);
    return this;
  }
  //..
}

addSkill() adds a skill to an employee. Internally it updates the collection of skills and best of all, transparently manages both sides of the relationship. And returns the current Employee instance to make it a FluentInterface. Now your client can use your API as ..

Employee emp = .. ; // managed
emp.add(skill_1)
   .add(skill_2)
   .add(skill_3);

Nice!

For clients holding a collection of skills (managed), add another helper method ..

@Entity
public class Employee implements Serializable {
  //..
  //.. properties

  public Employee addSkill(final Skill skill) {
    //.. as above
  }

  public Employee addSkills(final Set<Skill> skills) {
    skills.addAll(skills);

    for(Skill skill : skills) {
      skill.addEmployee(this);
    }
    return this;
  }
  //..
}

Both the above methods abstract away the mechanics of relationship management from the client and present fluent interfaces which are much more friendlier for the domain. Remove the public setter and don't forget to make the getter return an immutable collection ..

@Entity
public class Employee implements Serializable {
  //..
  //.. properties

  public Set<Skill> getSkills() {
    return Collections.unmodifiableSet(skills);
  }
  //..
}

The above approach makes your model more robust and domain friendly by abstracting away the mechanics of relationship management from your client code. Now, as an astute reader you must be wondering how would you use this domain entity as the command object of you controller layer in MVC frameworks - the setters are no longer there as public methods ! Well, that is the subject of another post, another day.

Tuesday, October 02, 2007

Refactoring - Only for Boilerplates ?

There are some bloggers who make you think, even if you do subscribe to an orthogonal view of the world. Needless to say, Steve Yegge is one of them. Quite some time back he introduced us to Fowler's Refactoring bible through this extremely thought provoking essay. Even if you are a diehard Java fan, his post forces you to think about the Caterpillar Butterfly conundrum. Seven months later, another great blogger, Raganwald, builds upon Yegge's post and discusses the various agonies that mutable local variables bring into your way of Extract Method refactoring. Raganwald's post led me thinking for the second time when I first read it. Last weekend, I came back to both of the links accidentally, through my favorite search engine (as if there are others also!) and re-read both of them. This post is an involuntary rant of that weekend reading.

Yegge says ..

Automated code-refactoring tools work on caterpillar-like code. You have some big set of entities — objects, methods, names, anything patterned. All nearly identical. You have to change them all in a coordinated way, like a caterpillar’s crawl, moving all the legs or lines this way or that.

Raganwald ends his post with the scriptum ..

This is exactly why languages with more powerful abstractions are more important than adding push-button variable renaming to less powerful languages. Where's the button that refactors a method with lots of mutable variables into one with no mutable variables? No button? Okay, until you invent one, don't you want a language that makes it easy to write methods and functions without mutable local variables?

Jokes apart, sentiments on the wayside, both of them target Java as the language of the Blub programmers, a language with less powerful abstractions, a language that generates caterpillar like code for push-button refactoring. They make you think deeply the possible reasons why you don't come across the term refactoring as often in other *powerful* languages like Lisp and Ruby. While you as a Java programmer consider yourself agile, when the menu item Refactor drops down into your favorite IDE and renames a public method correctly and with complete elan.

Is Refactoring only for Java ?

Martin Fowler defines refactoring as the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. We all want to do so on the software that we write - correct ? Sure, then why the heck, the Java guys boast on something that should, in principle, be the normal way of life irrespective of the language that you use ? Can it be the case that with other languages, programmers write the best optimized code the very first time and need not improve upon the design and code organization in future ? Too optimistic a claim even for Paul Graham.

OO Organization and Modularization

Java is an OO language and offers a wealth of various ways to organize your modules and subsystems. Being a class based language, Java offers various relationships to be established between your abstractions and organize them through polymorphic hierarchies. You can have inheritance of interface as well as implementation, containment and delegation and a slew of frameworks to work on various instantiation models and factories. You can have flexible packaging at the development level as well as deployment level. Java compiles into bytecode and runs in the most powerful VM on the planet - apart from pure Java you can drop into your codebase snippets of other scripting languages that can reside and run within the JVM. The moot point is that all these flexibilities provide you, the Java programmer, with a slew of options at every stage of design and development. As client's requirements change, as your codebase evolves, it is only natural that you optimize your code organization using the best possible modularization strategy. Does this establish refactoring as a necessity for well-designed software and not a mere tool ? More so, when you are using a programming language that offers a rich repertoire of code and module organization. Sure, you need to promote some state into an inner class increasing the level of abstraction, or refactor a slice of code snippet from an existing method into another reusable function.

And Automated Refactoring Tools ?

So, we now agree that refactoring is necessary and it leads us to the holy grail of well-designed software. Do we need automated refactoring tools ? May not be, if we are working on a small set of codebase which you can cache into your memory all at a time. The moment you start having page faults, you feel the necessity of automation. And typical Java enterprise systems are bound to cross the limits of you memory segment and lead to continuous thrashing. Obviously not something that you would want.

But what do you need to have automated refactoring capabilities into your IDE ? Type information, which unfortunately is missing from code written in most dynamic languages. Without type information, it is impossible to have automated full-proof refactoring. Cedric has a nice post which drums this topic to the fullest.

In short, with Java's rich platter of code organization policies, you have the flexibility of merciless refactoring and with Java's static type system you have the option of plugging in automated refactoring tools with your IDE - so nice.

What about Mutable Local Variables ?

It is a known fact that mutable variables, at any level, are an antipattern for functional programming. And a functional program makes it a mathematician's dream for all sorts of analyses. However, in Java, we are programming real world problems, which has enough of a state to model. And Java is a language which offers the power of assignment and mutability. Assignments are not always bad, and is a natural idiom for programming imperative languages. Try modeling every real world problem with a stack containing objects with nested life times and with a constant value during their entire life time. Dijkstra has the following observation while talking about the virtues of assignments and goto statements ..

.. the only way to store a newly formed result is by putting it on top of the stack; we have no way of expressing that an earlier value becomes now obsolete and the latter's life time will be prolonged, although void of interest. Summing up: it is elegant but inadequate. A second objection --which is probably a direct consequence of the first one-- is that such programs become after a certain, quickly attained degree of nesting, terribly hard to read.

It is all but natural that mutable local variables make automated refactoring of methods difficult. But, to me, the more important point is the locality of reference for those local variables. It all depends on the way the mutation is used and the closure of code that is affected by the mutation. No process can handle ill-designed code - disciplined usage of mutability on local variables can be handled quite well by the refactoring process. It all boils down to the problem of how well your code is organized and the criteria used in decomposing systems into modules (reference Parnas). After all, we are modeling state changes using mutable local variables - Raganwald suggests a coarse level state management using objects. This is what the State Design Pattern gives us. He mentions in one of his comments that Object oriented programming is, at its heart, all about state. if you write objects without state, you are basically using objects as namespaces. So true. But at the same time when you need to handle state changes through mutability, always apply them at the lowest level of abstraction, so that even if you need to synchronize for concurrent access, you do so by locking at the minimum level of granularity. So, why not mutable local variables, as long as the closure is well understood and justified by the domain for which it is being applied.

Mutable local variables, when inappropriately weaved into the sphagetti of a method, makes refactoring very difficult. At the same time when the programmer discovers this difficulty, she can try to redesign her method taking advantage of the numerous levels of abstraction that Java offers. From this point of view, Refactoring is also a great teacher towards the ultimate objective of well-designed software.