Posts Tagged ‘Fedora’

Hadoop on OpenStack with a CLI: Creating a cluster

January 29, 2014

OpenStack Savanna can already help you create a Hadoop cluster or run a Hadoop workload all through the Horizon dashboard. What it could not do until now is let you do that through a command-line interface.

Part of the Savanna work for Icehouse is to create a savanna CLI. It extends the Savanna functionality as well as gives us an opportunity to review the existing v1.0 and v1.1 REST APIs in preparation for a stable v2 API.

A first pass of the CLI is now done and functional for at least the v1.0 REST API. And here’s how you can use it.

Zeroth, get your hands on the Savanna client. Two places to get it are RDO and the OpenStack tarballs.

First, know that the Savanna architecture includes a plugin mechanism to allow for Hadoop vendors to plug in their own management tools. This is a key aspect of Savanna’s vendor appeal. So you need to pick a plugin to use.

$ savanna plugin-list
| name    | versions | title                     |
| vanilla | 1.2.1    | Vanilla Apache Hadoop     |
| hdp     | 1.3.2    | Hortonworks Data Platform |

I chose to try the Vanilla plugin, version 1.2.1. It’s the reference implementation,

export PLUGIN_NAME=vanilla
export PLUGIN_VERSION=1.2.1

Second, you need to make some decisions about the Hadoop cluster you want to start. I decided to have a master node using the m1.medium flavor and three worker nodes also using m1.medium.

export MASTER_FLAVOR=m1.medium
export WORKER_FLAVOR=m1.medium

Third, I decided to use Neutron networking in my OpenStack deployment, it’s what everyone is doing these days. As a result, I need a network to start the cluster on.

$ neutron net-list
| id            | name | subnets                     |
| 25783...f078b | net0 | 18d12...5f903 |

The cluster will be significantly more useful if I have a way to access it, so I need to pick a keypair for access.

$ nova keypair-list
| Name      | Fingerprint                                     |
| mykeypair | ac:ad:1d:f7:97:24:bd:6e:d7:98:50:a2:3d:7d:6c:45 |
export KEYPAIR=mykeypair

And I need an image to use for each of the nodes. I chose a Fedora image that was created using the Savanna DIB elements. You can pick one from the Savanna Quickstart guide,

$ glance image-list
| ID            | Name           | Disk Format | Container Format | Size       | Status |
| 1939b...f05c2 | fedora_savanna | qcow2       | bare             | 1093453824 | active |
export IMAGE_ID=1939bad7-11fe-4cab-b1b9-02b01d9f05c2

then register it with Savanna,

savanna image-register --id $IMAGE_ID --username fedora
savanna image-add-tag --id $IMAGE_ID --tag $PLUGIN_NAME
savanna image-add-tag --id $IMAGE_ID --tag $PLUGIN_VERSION
$ savanna image-list
| name           | id            | username | tags           | description |
| fedora_savanna | 1939b...f05c2 | fedora   | vanilla, 1.2.1 | None        |

FYI, --username fedora tells Savanna what account it can access on the instance that has sudo privileges. Adding the tags tells Savanna what plugin and version the image works with.

That’s all the input you need to provide. From here on the cluster creation is just a little more cut and pasting of a few commands.

First, a few commands to find IDs for the named values chosen above,

export MASTER_FLAVOR_ID=$(nova flavor-show $MASTER_FLAVOR | grep ' id ' | awk '{print $4}')
export WORKER_FLAVOR_ID=$(nova flavor-show $WORKER_FLAVOR | grep ' id ' | awk '{print $4}')
export MANAGEMENT_NETWORK_ID=$(neutron net-show net0 | grep ' id ' | awk '{print $4}')

Next, create some node group templates for the master and worker nodes. The CLI currently takes a JSON representation of the template. It also provides a JSON representation when showing template details to facilitate export & import.

export MASTER_TEMPLATE_ID=$(echo "{\"plugin_name\": \"$PLUGIN_NAME\", \"node_processes\": [\"namenode\", \"secondarynamenode\", \"oozie\", \"jobtracker\"], \"flavor_id\": \"$MASTER_FLAVOR_ID\", \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"master\"}" | savanna node-group-template-create | grep ' id ' | awk '{print $4}')

export WORKER_TEMPLATE_ID=$(echo "{\"plugin_name\": \"$PLUGIN_NAME\", \"node_processes\": [\"datanode\", \"tasktracker\"], \"flavor_id\": \"$WORKER_FLAVOR_ID\", \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"worker\"}" | savanna node-group-template-create | grep ' id ' | awk '{print $4}')

Now put those two node group templates together into a cluster template,

export CLUSTER_TEMPLATE_ID=$(echo "{\"plugin_name\": \"$PLUGIN_NAME\", \"node_groups\": [{\"count\": 1, \"name\": \"master\", \"node_group_template_id\": \"$MASTER_TEMPLATE_ID\"}, {\"count\": $WORKER_COUNT, \"name\": \"worker\", \"node_group_template_id\": \"$WORKER_TEMPLATE_ID\"}], \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"cluster\"}" | savanna cluster-template-create | grep ' id ' | awk '{print $4}')

Creating the node group and cluster templates only has to happen once, the final step, starting up the cluster, can be done multiple times.

echo "{\"cluster_template_id\": \"$CLUSTER_TEMPLATE_ID\", \"default_image_id\": \"$IMAGE_ID\", \"hadoop_version\": \"$PLUGIN_VERSION\", \"name\": \"cluster-instance-$(date +%s)\", \"plugin_name\": \"$PLUGIN_NAME\", \"user_keypair_id\": \"$KEYPAIR\", \"neutron_management_network\": \"$MANAGEMENT_NETWORK_ID\"}" | savanna cluster-create 

That’s it. You can nova list and ssh into the master instance, assuming you’re on the Neutron node and use ip netns exec, or you can login through the master node’s VNC console.

A recipe for starting cloud images with virt-install

January 8, 2014

I’m a fan of using the same OS image across multiple environments. So, I’m a fan of using cloud images, those with cloud-init installed, even outside of a cloud.

The trick to this is properly triggering the NoCloud datasource. It’s actually more of a pain than you would think, and not very well documented. Here’s my recipe (from Fedora 19),

xz -d Fedora-x86_64-20-Beta-20131106-sda.raw.xz

echo "#cloud-config\npassword: fedora\nchpasswd: {expire: False}\nssh_pwauth: True" > user-data

cp Fedora-x86_64-20-Beta-20131106-sda.raw.xz $NAME.raw
echo "instance-id: $NAME; local-hostname: $NAME" > meta-data
genisoimage -output $NAME-cidata.iso -volid cidata -joliet -rock user-data meta-data
virt-install --import --name $NAME --ram 512 --vcpus 2 --disk $NAME.raw --disk $NAME-cidata.iso,device=cdrom --network bridge=virbr0

Login with username fedora and password fedora.

You’ll also want to boost the amount of RAM if you plan on doing anything interesting in the guest.

You can repeat lines 6 through 10 to start multiple guests, just make sure to change the name in line 6.

If you want to ssh into the guest, you can use virsh console, login and use ifconfig / ip addr to find the address. Or, you can use arp -e and virsh dumpxml to match MAC addresses. Or just arp -e before and after starting the guest.

Note, you need to follow the meta-data and user-data lines very closely. If you don’t you may not trigger the NoCloud datasource properly. It took me a number of tries to get it right. Also, the volid needs to be “cidata” or it won’t be found, which turns out to be a configurable parameter for NoCloud. The chpasswd bit is to prevent being prompted to change your password the first time you login.

Consider becoming a fan of consistent OS images across your environments too!

Hello Fedora with docker in 3 steps

December 10, 2013

It really is this simple,

1. sudo yum install -y docker-io

2. sudo systemctl start docker

3. sudo docker run mattdm/fedora cat /etc/system-release

Bonus, for when you want to go deeper –

If you don’t want to use sudo all the time, which you shouldn’t want to do, you add yourself to the docker group,

$ sudo usermod -a -G docker $USER

If you don’t want to log out and back in, make your new group effective immediately,

$ su - $USER
$ groups | grep -q docker && echo Good job || echo Try again

If you want to run a known image, search for it on or on the command line,

$ docker search fedora

Try out a shell with,

$ docker run -i -t mattdm/fedora /bin/bash

EC2, VNC and Fedora

January 24, 2012

If you have ever wondered about running a desktop session in EC2, here is one way to set it up and some pointers.

First, start an instance, my preferred way is via Condor. I used ami-60bd4609 on an m1.small, providing a basic Fedora 15 server. Make sure the instance’s security group has port 22 (ssh) open.

Second, install a desktop environment, e.g. yum groupinstall 'GNOME Desktop Environment'. This is 467 packages and will take about 18 minutes.

Third, install and setup a VNC server. yum install vnc-server ; vncpasswd ; vncserver :1. This produces a running desktop that can be contacted by a vncviewer.

Finally, connect via an SSH secured VNC session.

VNC_VIA_CMD='/usr/bin/ssh -i KEYPAIR.pem -l ec2-user -f -L "$L":"$H":"$R" "$G" sleep 20' vncviewer localhost:1 -via INSTANCE_ADDRESS

What’s going on here? vncviewer allows for a proxy host when connecting to the vncserver. That is the -via argument. The VNC_VIA_CMD is an environment variable that specifies the command used to connect to the proxy. Here it is modified to provide the keypair needed to access the instance, and the user ec2-user, which is the default user on Fedora AMIs. The INSTANCE_ADDRESS is the Hostname from condor_ec2_q.

Alternatively, ssh-add KEYPAIR.pem followed by vncviewer localhost:1 -via ec2-user@INSTANCE_ADDRESS. However, be careful if you have many keys stored in your ssh-agent. They will all be tried and the remote sshd may reject your connection before the proper keypair is found.


  • It takes about 20 minutes from start to vncviewer. Once the instance is setup consider creating your own AMI.
  • Set a password for ec2-user, otherwise the screensaver will lock you out. Use sudo passwd ec2-user.
  • Remember AWS charges for data transmitted out of the instance, as well as the uptime of the instance, see EC2 Pricing. You will want to figure out how much bandwidth your workflow takes on average to figure out total cost. For me, a half hour of browsing Planet Fedora, editing with emacs, and compiling some code, transmitted about 60MB of data. That measurement is the difference in eth0’s “TX bytes” as reported by ifconfig. This is not a perfect estimate because there is may have been data transferred within EC2, which is not charged.
  • For transmit rates, consider running bmw-ng to see what actions use the most bandwidth.
  • Generally, make the screen update as little as possible. Constantly changing graphics on web pages can run 60-120KB/s. Compare that to a text console and emacs producing a TX rate closer to 5-25KB/s.
  • Cover consoles with compilations, or compile in a low verbosity mode.

%d bloggers like this: