A couple of pretty heavy-going sessions at Devoxx today. First up was Cassandra by Example with Jonathan Ellis one of the founders of Cassandra support company Riptano.
I have already had some experience with Cassandra at both my previous and current jobs but it was good to go over the principals of Cassandra as well as seeing an example application deconstructed.
Cassandra’s strengths are:
- Scalability
- Reliability
- No single point of failure
- Multiple data centre support
- Integrated Hadoop support (lets you run map reduce jobs on data in Cassandra without any ETL)
Of course Cassandra is not ACID and has limited support for OLTP ad-hoc queries. However, companies that have really scaled traditional RDBMSs like MySQL or Oracle end up dispensing with these features anyway in order to achieve that scale.
Jonathan had an interesting quote from Twitter:
It used to take 2 weeks to perform an ALTER TABLE on the tweets table
This is definitely something I can sympathise with. If you don’t plan ahead it can be easy to suddenly find your tables are so big they cannot be changed without serious pain, downtime or both.
When designing a relational schema we tend to think of objects and relationships. With Cassandra we need to think of objects and the queries we want to run against them. For each type of query you will need a column family (something like a table).
When choosing a key for rows, a natural key is best. If you need a surrogate key, use a UUID as integers may create collisions due to the distributed nature of Cassandra. Version 1 UUIDs can be sorted by time but if you don’t need time ordering, use version 4 UUID.
Using Thrift directly should be avoided at all costs in favour of higher level libraries like Hector. There is a JPA implementation called Kundera but this is based around Lucene so unless search is an important part of your application it may not be the best choice.
The afternoon was spent learning what’s new in Hibernate with Emmanuel Bernard from JBoss.
Fetch profiles can be defined and chosen at runtime, eg:
@Entity
@FetchProfile(name = "all",
fetchOverrides = {
@FetchProfile.FetchOverride(
entity = Customer.class,
association = "orders",
mode = FetchMode.JOIN)
@FetchProfile.FetchOverride(
entity = Order.class,
association = "country",
mode = FetchMode.JOIN)
})
public class Customer {
@Id @GeneratedValue private long id;
private String name;
private long customerNumber;
@OneToMany private Set<Order> orders;
// standard getter/setter
}
Session session = ...;
session.enableFetchProfile( "all" );Â // name matches @FetchProfile name
Customer customer = (Customer) session.get( Customer.class, customerId );
session.disableFetchProfile( "all" ); // or just close the session
A lot of time was spent on the Criteria API which lets you write object oriented, type-safe and strongly typed queries.
Hibernate Search provides lucene-based full text search for Hibernate and looks quite neat. You get transparent index synchronisation and support for clustering and loading of massive indexes.
Hibernate Envers also looks really interesting. This is a framework for dealing with historical data in Hibernate.
Entities are versioned and all changes (insert, update, delete) are audited transparently. The existing tables are unchanged but new audit tables are created for each entity. For example a Person table will get a Person_AUD table. This looks the same with the addition of revision number and revision type (add, delete, modify) columns.
You can look up by revision and query by revision or entity history. You can also define a RevisionEntity to add new fields atop the standard revision number and date.
It looks really helpful for auditing and dealing with historical data although there is of course a performance hit to inserts, updates and deletes as the audit table must be written to as well.
Nice article. Just to share with you that Kundera has travelled long journey and now no more relies on lucene for search.
-Vivek