Monday, May 30, 2011

Integration is the mess that noone wants to clean up

The title for this post came as a rather tongue in cheek comment by the sys admin at my workplace. However I think this is a rather accurate description of a personal project that I've been working on for the past six months. This project was designed as an educational project into different technologies that I've wanted to play with; so a lot of the decisions were guided by that. Across the way I've thought about different issues, found different tips and tricks and thought I'd share them.

The goal of this project is an integration piece between a website (through RSS) and social media sites like Facebook, MySpace, and Twitter (with all the links fed back to the original site). The goal was to not have to replicate data across the multiple sites; as well as give me an excuse to play with new toys.

The runtime environment is Google App Engine as this app would run infrequently and GAE is free. I also have wanted to write a GAE app for a while, so this seemed like a good idea. The language is Java since that's my primary development language and I don't know Python.

The build tool is Ant as I despise Maven. I use Maven at work, and I find it to be inflexible, bloated and just plain annoying. It does have some good ideas however and given Ant's import abilities, one can write generic build tasks that implement good build practices without all the pain. I did consider other build tools like Gradle but I found personally that after thinking about what I wanted to do for a while I was able code the build declaratively with Ant being smart enough to handle all the heavy lifting. I'm considering Open Sourcing my Ant build library that I've accumulated when I've cleaned it up a little bit.

Maven also provides dependency management however I personally find Ivy's dependency management to be more mature, cleaner, simplier and easier to configure/use than Maven.

Discussions about build tools can often get people a bit hot under the collar and I found The Java Posse podcast on the subject to be very educational (as a hint they're not very Maven friendly).

My IDE of choice is Eclipse (Helios), with the relevant Google plugins. One other reason that I dislike Maven is that the M2 integration sucks (I've heard it's a lot better with Indigo however I've yet to try). However Ant was not immune from problems either in that the runtime I am using is 1.8.1, and the Ant build editor doesn't recognise syntax from that version so Eclipse tells me my build.xml has errors in it. Runs fine however.

For my SCM I'm using Mercurial as it is the best VCS around, beating Git by a gazillion miles (yes I have actually used Git in a sophisticated manner and Hg is so much easier and more intuitive).

The app itself is broken down into 3 parts (with Spring handling all the configuration). The first handles datastore/RPC services, with a GWT frontend, Spring MVC providing the CRUD RPC endpoints, and Objectify handling persistence. The second part is implementing the manual workflow of OAuth2 with Spring MVC rendering the views (very simple JSPs) and accepting callbacks. The third was the part that actually made posts to the social media sites and kept everything in sync.

I spent a lot of time trying to bash JPA to work on GAE, however the DataNucleus implementation is quirky at best. It does certain tasks (like the assigning of PK values) differently from all other JPA implementations and the JDO junk that gets stuck in you .class file can mess with other annotations or class behaviour. I spent one Saturday ripping out JPA and replacing it with Objectify (guided by tests for you TDD purists out there), and I've little trouble ever since.

A few big design principles that I've been trying to hammer into myself is good old DRY and others from the SOLID acronym like SRP; guided by tests. Working with a framework like Spring is that if you don't follow these ideas then you can get yourself into trouble quickly enough to realise something's amiss. For example even when using Objectify one still has to deal with transaction management. It's the same bit of code that runs for all DAOs so where do you put it? One should of course favour composition over inheritance. I went with an AOP (JDK proxy) approach where a transaction manager provides advice around the DAO methods, injecting an "entity manager" which the DAOs then used for DS operations. Very elegant but not as easy if one doesn't code to interfaces (the L of SOLID as I understand it).

However GAE itself doen't help with the testing side of these principles, and it's an area where Google could really improve their tooling, especially if they want more than Micky Mouse projects running on GAE. It's nigh impossible to preload the local DS file with test data for integration/acceptance tests (where the tests run in one JVM and the dev_server serving your app runs in another). I hacked a solution together but it broke when the SDK was updated. The only way therefore is to use an interface which you build into your app. Since most of my entities were being served in a XML representation over HTTP to be consumed by a GWT front end it wasn't too much of a pain to drive my tests through the browser using Selenium. However for a more sophisticated project it's a nail in the coffin for testability.

Learning Spring MVC was a joy, with the ability to render a view of the model (through JSP) or return data in a HTTP response body (ala REST/RPC) with minimal effort. As far as MVC frameworks go it's the best I've worked with to date. However there is some room for improvement with the way that Content-Types (MIME) are handled in an AJAX world, but I've detailed that problem before

Running code on GAE of course other than native servlets is annoying, which some readers may already know because of the way GAE attempts to serialize everything. I had to rework my atchitecture a bit to try and get around that, but I still get errors in the logs due to Spring classes not being Serializable.

I've probably left a few details out here and there, and questions/comments are welcome (unless you want to troll then bugger off). Constructive criticisms of technology choices are always interesting and I like to chig wag about that; although please don't start a flame war.

Overall I found this project to be satisfying and educational and I've come out the other side a better engineer/developer.

Monday, May 16, 2011

Filling a table in Jasper Reports

You wouldn't think it, but figuring out how to fill a table with data in Jasper Reports (JR) was actually more difficult than it sounds. Poor documentation, bad/incorrect/plain stupid examples and forum posts that have been left for years with no answer! Due to library constraints on this project, this example is with Jasper Report/iReport 3.4.7 and YMMV with other versions.

Pictorial example of a table in a JR report


Say that you're producing a report with a table of Customers that is embedded within a report with other data (see image above). JR treats the table as a subreport (but with different XML tags) which means that data you fill the parent/master report with isn't instantly available to the table. This is a caveat that isn't intuitive to find out/understand until you find out that the table is a subreport. To make matters worse iReport assumes that you're getting your data either from a straight JDBC connection, or populates your table's <dataSourceExpression> with a JREmptyDataSource which will populate your table fields with null.

^How helpful^.

If you're in any sort of Enterprise system you'll no doubt have DAOs, and different models (domain, DTOs, etc) to feed into your reporting code, so you'll need to strip out the empty data source iReport sticks in your template.

Fortunately there is a JRBeanCollectionDataSource class that maps field names in the report template to properties in the data using the Java Beans naming convention. The last step is actually make your data available to the table. This is a combination of fixing the template and writing a reporting DTO class.

Firstly a field in the report will need to be a Java Collection type. I didn't have much success with non JSE collections, and it's better to code to interfaces anyway.
<field name="customers" class="java.util.List"/>
The DTO will need to provide an instance of that collection type with a getter that will match the name of the field in the template. Using the example in the image
public class CustomerList  {
public List<Customer> getCustomers() { ... }
}
Then for each field placeholder in the table, if it maps to a property getter in the Customer object then that property value will get substituted into the report.

The final configuration is to tell the table to source its data from the collection.
<dataSourceExpression><![CDATA[new net.sf.jasperreports.engine.
data.JRBeanCollectionDataSource($F{customers})]]>
</dataSourceExpression>
Then when the report is filled with an instance of CustomerList that you've prepared earlier, the report engine will iterate over the Collection and fill each row of the table.

Once you've done some digging/filtering and realised how JR does it's tables it's actually pretty easy/plain obvious. However because of the afore mentioned reasons (incorrect examples sending one down the garden path) it can be time consuming and frustrating. Given that a table is a core requirement of most reports wanted by a business it would make sense to me to make putting a table in a report to be so dead simple.