Those who want to try the much-hyped Hadoop but haven't got a cluster or two to spare can now test the data processing platform on their desktops, thanks to a new release from Hadoop distributor Hortonworks.
Hortonworks Sandbox is a single-node implementation of Hadoop, one based on the Hortonworks Data Platform. Packaged in a virtual machine, it includes all the typical components found in a Hadoop deployment, including the HCatalog storage management subsystem, the Hive data warehouse and the Pig set of data analysis tools.
The package also offers a number of tutorials that show users how to execute Hadoop data analysis tasks, according to Cheryle Custer, who is the Hortonworks director of services marketing. The package includes three tutorials, and more will be made available to download in the months to come. The package also includes videos and even online datasets that can be used to test features.
While widely used, Hadoop can present a challenge for new users to learn, at least for data scientists and anyone who isn't a system administrator. The software requires a considerable amount of work to set up and run. In addition to installing the software and a Java Virtual Machine (JVM) if one is not already on the system, the user must also install a file system, and the software itself requires a user account, which could pose a security risk.
The Hortonworks Sandbox eliminates all that installation work, requiring only that the user download and run a virtual machine. The virtual machine package, which is built on the CentOS Linux distribution, will run on either VMware and Oracle Virtual Box environments.
In addition to building a Hadoop sandbox, Hortonworks engineers have also been busy working on the company's flagship enterprise Hadoop distribution. The Hortonworks Data Platform version 1.2, released last week, offers new management and security tools.