Hadoop on OpenStack with a CLI: Creating a cluster

OpenStack Savanna can already help you create a Hadoop cluster or run a Hadoop workload all through the Horizon dashboard. What it could not do until now is let you do that through a command-line interface.

Part of the Savanna work for Icehouse is to create a savanna CLI. It extends the Savanna functionality as well as gives us an opportunity to review the existing v1.0 and v1.1 REST APIs in preparation for a stable v2 API.

A first pass of the CLI is now done and functional for at least the v1.0 REST API. And here’s how you can use it.

Zeroth, get your hands on the Savanna client. Two places to get it are RDO and the OpenStack tarballs.

First, know that the Savanna architecture includes a plugin mechanism to allow for Hadoop vendors to plug in their own management tools. This is a key aspect of Savanna’s vendor appeal. So you need to pick a plugin to use.

$ savanna plugin-list
| name    | versions | title                     |
| vanilla | 1.2.1    | Vanilla Apache Hadoop     |
| hdp     | 1.3.2    | Hortonworks Data Platform |

I chose to try the Vanilla plugin, version 1.2.1. It’s the reference implementation,

export PLUGIN_NAME=vanilla
export PLUGIN_VERSION=1.2.1

Second, you need to make some decisions about the Hadoop cluster you want to start. I decided to have a master node using the m1.medium flavor and three worker nodes also using m1.medium.

export MASTER_FLAVOR=m1.medium
export WORKER_FLAVOR=m1.medium

Third, I decided to use Neutron networking in my OpenStack deployment, it’s what everyone is doing these days. As a result, I need a network to start the cluster on.

$ neutron net-list
| id            | name | subnets                     |
| 25783...f078b | net0 | 18d12...5f903 |

The cluster will be significantly more useful if I have a way to access it, so I need to pick a keypair for access.

$ nova keypair-list
| Name      | Fingerprint                                     |
| mykeypair | ac:ad:1d:f7:97:24:bd:6e:d7:98:50:a2:3d:7d:6c:45 |
export KEYPAIR=mykeypair

And I need an image to use for each of the nodes. I chose a Fedora image that was created using the Savanna DIB elements. You can pick one from the Savanna Quickstart guide,

$ glance image-list
| ID            | Name           | Disk Format | Container Format | Size       | Status |
| 1939b...f05c2 | fedora_savanna | qcow2       | bare             | 1093453824 | active |
export IMAGE_ID=1939bad7-11fe-4cab-b1b9-02b01d9f05c2

then register it with Savanna,

savanna image-register --id $IMAGE_ID --username fedora
savanna image-add-tag --id $IMAGE_ID --tag $PLUGIN_NAME
savanna image-add-tag --id $IMAGE_ID --tag $PLUGIN_VERSION
$ savanna image-list
| name           | id            | username | tags           | description |
| fedora_savanna | 1939b...f05c2 | fedora   | vanilla, 1.2.1 | None        |

FYI, --username fedora tells Savanna what account it can access on the instance that has sudo privileges. Adding the tags tells Savanna what plugin and version the image works with.

That’s all the input you need to provide. From here on the cluster creation is just a little more cut and pasting of a few commands.

First, a few commands to find IDs for the named values chosen above,

export MASTER_FLAVOR_ID=$(nova flavor-show $MASTER_FLAVOR | grep ' id ' | awk '{print $4}')
export WORKER_FLAVOR_ID=$(nova flavor-show $WORKER_FLAVOR | grep ' id ' | awk '{print $4}')
export MANAGEMENT_NETWORK_ID=$(neutron net-show net0 | grep ' id ' | awk '{print $4}')

Next, create some node group templates for the master and worker nodes. The CLI currently takes a JSON representation of the template. It also provides a JSON representation when showing template details to facilitate export & import.

export MASTER_TEMPLATE_ID=$(echo "{\"plugin_name\": \"$PLUGIN_NAME\", \"node_processes\": [\"namenode\", \"secondarynamenode\", \"oozie\", \"jobtracker\"], \"flavor_id\": \"$MASTER_FLAVOR_ID\", \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"master\"}" | savanna node-group-template-create | grep ' id ' | awk '{print $4}')

export WORKER_TEMPLATE_ID=$(echo "{\"plugin_name\": \"$PLUGIN_NAME\", \"node_processes\": [\"datanode\", \"tasktracker\"], \"flavor_id\": \"$WORKER_FLAVOR_ID\", \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"worker\"}" | savanna node-group-template-create | grep ' id ' | awk '{print $4}')

Now put those two node group templates together into a cluster template,

export CLUSTER_TEMPLATE_ID=$(echo "{\"plugin_name\": \"$PLUGIN_NAME\", \"node_groups\": [{\"count\": 1, \"name\": \"master\", \"node_group_template_id\": \"$MASTER_TEMPLATE_ID\"}, {\"count\": $WORKER_COUNT, \"name\": \"worker\", \"node_group_template_id\": \"$WORKER_TEMPLATE_ID\"}], \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"cluster\"}" | savanna cluster-template-create | grep ' id ' | awk '{print $4}')

Creating the node group and cluster templates only has to happen once, the final step, starting up the cluster, can be done multiple times.

echo "{\"cluster_template_id\": \"$CLUSTER_TEMPLATE_ID\", \"default_image_id\": \"$IMAGE_ID\", \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"cluster-instance-$(date +%s)\", \"plugin_name\": \"$PLUGIN_NAME\", \"user_keypair_id\": \"$KEYPAIR\", \"neutron_management_network\": \"$MANAGEMENT_NETWORK_ID\"}" | savanna cluster-create 

That’s it. You can nova list and ssh into the master instance, assuming you’re on the Neutron node and use ip netns exec, or you can login through the master node’s VNC console.

Tags: , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: