Sunday, October 25, 2009

Lucene 2.9.0

A few weeks ago Lucene 2.9.0 was released. The list of changes is impressive, and it's certainly an interim release paving the way to release 3.0, where I suppose many of the now deprecated methods will start to disappear.

One of the most interesting changes for me is LUCENE-1382, which will greatly simplify checkpoint management in lucis.

Looking forward to 3.0.

Saturday, September 5, 2009

Functional dependencies

Regarding papers, yesterday's read was Type Classes with Functional Dependencies by Mark P. Jones.

Quite interesting.

CUFP 2009 Keynote

Bryan O’Sullivan is one of the authors of the book Real World Haskell, and he's given the keynote at CUFP 2009.
I really liked the presentation.

Papers

Over the last years I've become an avid reader of papers. As the time goes by, I start to have heaps of PDF files scattered through multiple files and computers, unable to find anything.
A few years ago I read in Ars Technica about Papers, a software package that seemed to fulfill all my paper archiving needs :). However, it wasn't until last month that I gave it a try and I only regret not having donde so earlier. Its matching capabilities, search engine and spotlight integration are really wonderful. It is really the iTunes of academic papers.

Highly recommended.

Tuesday, July 28, 2009

Counting and Grouping Queries in Lucene

When using a Lucene index to look up some information you have access to some querying facilities not found in other kind of repositories. However, in a classical trade-off, you lose some features such as the aggregate queries easily performed in relational databases.

Anyway, if you need to perform this kind of operations, they may be easily implemented using hit collectors. So, I've included in lucis two simple operations, counting and grouping results:


LucisQuery<Result> count(Query query, Filter filter)
LucisQuery<GroupResult> group(Query query, Filter filter, List<String> fields)


The LucisQuery object is used to decouple index control policy (when to open and close it, etc) from the queries themselves.

The counting query just needs the Lucene query to perform and the (optional) filter to apply. The result holds the number of documents found and the time the query needed.

For the grouping query you must provide the list of field names you want to group by (in order) and the query result is the same that the counting query plus the root group (the one corresponding to the first field name), where a group is something like (partial API showed, see the source):


public class Group {
public int getHits();

public Set<String> getGroupNames();

public Group getGroup(String name);
}


So, for each collected value of the provided field you get a child group which itself contains the groups representing the nested fields. The number of hits in a group may not be equal to the sum of the hits in the children groups if any of the fields is multivalued.

Friday, June 26, 2009

Pointcut expressions for stereotypes

Some of the new additions to Spring in version 2.5 are annotation-driven configuration, classpath scanning for managed components and @AspectJ support for AOP. These features can gretly reduce the amount of XML needed to configure your application context.

Together with classpath scanning came stereotypes, which, as the reference documentation states, make ideal targets for pointcuts. In putting this idea (stereotype as pointcut target) you may find two different candidate expressions in the documentation (the Service stereotype is used as an example): @target(org.springframework.stereotype.Service) and @within(org.springframework.stereotype.Service) with no clear differentiation between them.

However, some of the differences started to show up as Spring auto-proxying mechanism started to try creating proxies for classes that had no stereotype and were not advised in any other way. After a non-immediate web-searching session, the real problem stood up: if you use @target every class is proxied just in case a new subclass with the annotation is introduced. Even though the issue is marked as resolved, I have suffered it in 2.5.6SEC01.

So, in the meantime, just use @within.


Sunday, March 15, 2009

'This' type and Covariant builders

When designing your own APIs, either by using interfaces or class hierarchies, you usually find the need to reference the type of the actual implementacion of the interface or class you are writing. This is an actual problem, and there's even a proposal in Project Coin (small language changes for Java 7) to deal with it.

One of this cases, and the one the rest of this post is focused on, is using the 'This' type as the return type of some methods. This is particularly useful for method chaining in fluent interfaces.

Up until Java 5, and the introduction of covariant return types, it was impossible to deal with this issue, which led to frequent casting. But even with Java 5+, the solution is based in manually overriding every method in the subclass (or subinterface) changing the return type to the desired subtype, calling super and returning this.

So, I have checked-in a partial solution to this issue. The type signature of the Type class This<T extends This<T>> (equivalent to that of the Enum class) allows the implementation type to be referenced inside the class, and its value obtained by calling thisValue. But not only the final implementation type, but also some siblings in the hierarchy (as showed in the tests), as the Java type system cannot guarantee this.

That's the reason for the validation in the constructor of This, the reason it is an abstract class instead of an interface, and one of the many reasons it is only a partial soluction.

One application of the This type is for builders (see the builder pattern and item 2 in Effective Java 2nd Ed.). If your builder is part of a hierarchy you are faced with the problem described in this post, so I've also included a base class for them, the covariant builder.