With its popularity growing, Apache's Cassandra database for high-volume, real-time data management will be fitted with technical and query language improvements in the fall.
The open source NoSQL database, which reached the 1.0 release stage last October, is now in use at companies including Disney, eBay, and Netflix, according to Jonathan Ellis, project chair for the Apache Cassandra project and CTO at DataStax, which offers commercial products and services based on Cassandra. "Our best estimate is there are north of 1,000 Cassandra production deployments out there," Elllis said Wednesday at the Cassandra Summit 2012 conference in Santa Clara, Calif.
[ See InfoWorld's analysis of NoSQL database standouts, including Cassandra, MongoDB and others. Subscribe to InfoWorld's Enterprise Data Explosion newsletter to find out best practices in handling exponential data growth. ]
Version 1.2 of Cassandra, eyed for an October release, is slated to offer concurrent schema change and virtual node capabilities as well as JBOD (Just a Bunch of Disks) deployment support, Ellis said. With concurrent schema changes, multiple clients can issue schema changes at the same time and they will be merged across a cluster safely. Virtual nodes will enable a full cluster to parallelize certain operations, providing improved speed in operations such as adding or replacing a node.
Cassandra Query Language (CQL) 3, which is planned for version 1.2, is due to feature collections support that enables concatenation of data sets such as email addresses. Query tracing may be supported as well. Although CQL 3 does not offer full backward compatibility with CQL 2, users still will be able to use CQL 2 and move to CQL 3 when building new parts of their application.
At educational services provider Hobsons, Cassandra is used for log- and click-tracking for the company's website. "It scales like crazy," said Patrick McFadin, chief architect at Hobsons. The company had used an Oracle 10 database for these purposes, but found it was getting too costly to scale Oracle for the kind of functions Cassandra provides, McFadin said. Hobson still uses Oracle for relational data functions.
Cassandra complements another prominent Apache platform, the Hadoop distributed computing platform, DataStax officials said. "The big difference [between the two technologies] is Hadoop is built for deep-style analytics, [for] when you're asking really detailed questions about large amounts of data, whereas Cassandra's all about serving up data really quickly," said Matt Pfeil, DataStax vice president of customer solutions.