How to Build a Hadoop 2.6 Cluster Using OpenStack

版本 9

    by Ekine Akuiyibo and Orgad Kimchi


    How to set up a multinode Apache Hadoop 2.6 (YARN) cluster using Oracle OpenStack for Oracle Solaris and other Oracle Solaris technologies such as Oracle Solaris Zones and Oracle Solaris Unified Archives. The created cluster is a virtual cluster that uses only one physical machine, which enables efficient vertical scaling and promotes optimized resource utilization.


    Table of Contents
    Introduction to Hadoop
    Introduction to OpenStack
    The Benefits of Using OpenStack for a Hadoop Cluster
    OpenStack Sahara
    Architecture We Will Use
    Hadoop Zones Information
    Tasks We Will Perform
    Performing the Tasks
    See Also
    About the Authors
    Appendix—OpenStack Configuration Script


    This article starts with a brief overview of Hadoop and OpenStack, and follows with an example of setting up a Hadoop cluster that has two NameNodes, a Resource Manager, a History Server, and three DataNodes. As a prerequisite, you should have a basic understanding of Oracle Solaris Zones and network administration.


    Introduction to Hadoop


    Apache Hadoop is an open source distributed computing framework designed to process very large unstructured data sets. It is composed of two main subsystems: data storage and data analysis. Apache Hadoop was developed to address four system principles: the ability to reliably scale processing across multiple physical (or virtual) nodes, moving code to data, dealing gracefully with node failures, and abstracting the complexity of distributed and concurrent applications.


    Introduction to OpenStack


    OpenStack is free and open source cloud software typically deployed as an infrastructure as a service (IaaS) solution. The OpenStack platform exists as a combination of several interrelated subprojects that control the provisioning of compute, storage, and network resources in a data center. OpenStack technology simplifies the deployment of data center resources while at the same time providing a unified resource management tool. Oracle Solaris 11 includes a complete OpenStack distribution called Oracle OpenStack for Oracle Solaris.


    The Benefits of Using OpenStack for a Hadoop Cluster


    In addition to being two of the most active open source community projects, Hadoop and OpenStack are complementary technologies. This is especially evident as it applies to long-term enterprise adoption of both green field technologies.


    Hadoop, the more mature of the two technologies, still faces significant operational challenges. A representative Hadoop adoption journey would, for example, comprise pilot projects, quality assurance, testing, performance validation, and production. Each activity requires its own Hadoop cluster. Moreover, each business unit evaluating separate workloads or using a different software and hardware stack will have different Hadoop cluster requirements. Supporting these multiple Hadoop cluster environments can be an operational nightmare, and spinning up virtual machines (VMs) on a public cloud does not guarantee that the solution will work in-house.


    OpenStack alleviates these operational complexities. In addition, OpenStack provides functionality that reduces provisioning and deployment time by providing template-based provisioning. Templates can be specified at the cluster and node level allowing self-provisioning of Hadoop clusters via cluster-based or node-based templates, thereby reducing deployment time and eliminating typical errors. Templates also allow for flexibility in defining cluster types, for example, Hadoop-specific or not Hadoop-specific. Other operational benefits include more efficient cluster timesharing as well the availability of the basic infrastructure for supporting varying service level agreements (SLAs) through resource and access isolation.


    OpenStack, the "newer" technology of the two, can use Hadoop as its "killer" application. Several architectural characteristics of Hadoop make this the case, including its scale flexibility (vertical and horizontal), its independence from legacy applications and workloads, and the ability for multiple users (departments) to share the same platform. In other words, Hadoop is the ideal cloud application for OpenStack proof of concept.


    Oracle OpenStack for Oracle Solaris, Oracle Solaris Zones, Oracle Virtual Networking, and Oracle Solaris Unified Archives provide the fundamental building blocks for the Hadoop and OpenStack integration discussed in this article. Using Unified Archives, virtual Hadoop clusters can be provisioned in the time it takes to boot Oracle Solaris Zones. Oracle Solaris Zones technology provides zero-overhead virtualization making zones highly efficient. Furthermore, Oracle Solaris Kernel Zones extend the basic zones functionality to include operating system–level isolation and independence. The ability to rapidly provision zones coupled with the flexibility of Unified Archives enables template-based provisioning both at the cluster and node levels. The entire infrastructure is monitored through one pane of glass—OpenStack Horizon.


    OpenStack Sahara


    Initially named Savanna, Sahara is the data-processing component of OpenStack. Incubated in the OpenStack Icehouse release and integrated in the OpenStack Juno release, Sahara provides simple-click (push-button) provisioning of Hadoop clusters and elastic data processing (EDP) capabilities analogous to Amazon Elastic MapReduce. Sahara's integration with core OpenStack services including Horizon gives operators the ability to manage Hadoop clusters from the OpenStack dashboard. OpenStack Sahara is a work in progress for Oracle OpenStack for Oracle Solaris.


    Note: In this article, we are not going to use OpenStack Sahara.


    Architecture We Will Use


    Figure 1 shows the architecture used in this article.




    Figure 1. Architecture


    The Hadoop cluster building blocks are as follows:


    • NameNode: The centerpiece of the Hadoop Distributed File System (HDFS), which stores file system metadata and is responsible for all client operations
    • Secondary NameNode: Synchronizes its state with the active NameNode in order to provide fast failover if the active NameNode goes down
    • ResourceManager: The global resource scheduler, which directs the slave NodeManager daemons to perform the low-level I/O tasks
    • DataNodes: Nodes that store data in the HDFS and are also known as slaves; these nodes run the NodeManager process that communicates with the ResourceManager
    • History Server: Provides REST APIs in order to allow the user to get the status of finished applications and provides information about finished jobs


    Figure 2 shows an example Hadoop cluster.




    Figure 2. Example Hadoop cluster


    Hadoop Zones Information


    We will leverage the integration between Oracle Solaris Zones virtualization technology and the OpenStack framework that is built into Oracle Solaris.


    Table 1 shows a summary of the Hadoop zones we will create:



    FunctionZone NameIP Address
    Secondary NameNodename-node2192.168.1.3/24


    Tasks We Will Perform


    In the next subsections, we will perform the following operations in order to build the architecture:





    You need to have a working OpenStack Juno environment running Oracle Solaris or later. Refer to Installing and Configuring OpenStack in Oracle Solaris 11.2 in order to build an OpenStack environment. You will also need to download from Apache a copy of Hadoop 2.6.


    Important: In the examples presented in this article, the command prompt indicates which user needs to run each command in addition to indicating the environment where the command should be run. For example, the command prompt root@global:~# indicates that user root needs to run the command from the global zone.


    Performing the Tasks


    Create OpenStack Client Environment Scripts


    To increase the efficiency of client operations, OpenStack supports simple client environment scripts, which are also known as OpenRC files. The scripts include the location of the Identity service and the admin and hadoop user credentials. Future portions of this article reference these scripts to load appropriate credentials for client operations.


    Create the client environment scripts for the admin and hadoop users by running the following commands.


    root@global:~# vi

    export OS_AUTH_URL=http://localhost:5000/v2.0

    export OS_PASSWORD=neutron

    export OS_TENANT_NAME=service

    export OS_USERNAME=neutron


    root@global:~# vi

    export OS_AUTH_URL=http://localhost:5000/v2.0

    export OS_PASSWORD=secrete

    export OS_TENANT_NAME=demo

    export OS_USERNAME=hadoop


    To run clients as a specific user, you can simply load the associated client environment script prior to running the clients. This will load the environment variables for the location of the Identity service and the admin user credentials, for example:


    root@global:~# source


    Verify that the environment variables have been applied:


    root@global:~# env | grep -i os






    Create the hadoop User


    Keystone is an OpenStack service that provides authentication and authorization services between users, administrators, and OpenStack services.


    Create the hadoop user using the following command:






    • name is the user name
    • tenant is the tenant name
    • pass is the user password
    • email is the e-mail address


    Note: OpenStack generates IDs dynamically, so you will see different values in the example command output.


    The newly created hadoop account will be used for future management of the OpenStack environment.


    Create the Tenant Network


    Neutron provides networking capabilities in OpenStack, enabling VMs to talk to each other within the same tenants and subnets, and enabling them to talk directly to the outside world.


    The tenant network provides internal network access for instances. The architecture isolates this type of network from other tenants.


    Load the location of the Identity service and the hadoop user credentials:


    root@global:~# source


    Create the network:




    Your network also requires a subnet that is attached to it.


    Some of the Hadoop services rely on static IP addresses, so this subnet will not use DHCP in order to enable static IP address allocation for the instances.




    From the command output, we can see the following:


    • The subnet IP address range will be through (allocation_pools)
    • DHCP is disabled on this network (enable_dhcp | False)
    • The IP gateway for this subnet will be (gateway_ip).


    Create the Glance Image


    Glance is a service that provides image management in OpenStack. It responsible for managing the images that you install on the compute notes when you create new VM instances.


    The next step will be to populate Glance with an image that we can use for our instances.


    In the Oracle Solaris implementation, we take advantage of a new archive type called Unified Archives. Therefore, we will create a Unified Archive.


    The following shows how to capture a Unified Archive of a newly created non-global zone called myzone, and then upload it to the Glance repository.


    Create the zone:


    root@global:~# zonecfg -z myzone create


    Install and boot the zone:


    root@global:~# zoneadm -z myzone installroot@global:~#zoneadm -z myzone boot


    We need to modify the zone as a cloud image. Cloud images are preinstalled bootable disk images that have had their identifiable host-specific metadata—such as SSH host keys, MAC addresses, and static IP addresses—removed.


    When we deploy instances using OpenStack, we typically provide an SSH public keypair that's used as the primary authentication mechanism to our instance.


    Modify the /etc/ssh/sshd_config file in order to enable root access without a password:


    root@global:~# zlogin myzone 'sed /^PermitRootLogin/s/no$/without-password/ \

    < /etc/ssh/sshd_config > /system/volatile/sed.$$ ; \

    cp /system/volatile/sed.$$ /etc/ssh/sshd_config'


    Download Hadoop


    This article uses Apache Hadoop Release 2.6.0.


    Download the Hadoop tarball hadoop-2.6.0.tar.gz.


    Copy the tarball into the zone's /var/tmp directory:


    root@global:~# cp hadoop-2.6.0.tar.gz /system/zones/myzone/root/var/tmp


    Shut down the zone:


    root@global:~# zoneadm -z myzone halt


    Create the Unified Archive (UAR):


    root@global:~# archiveadm create -z myzone /var/tmp/myzone.uar


    Create the following upload.ksh script in order to upload the UAR into Glance. The script will perform the following actions:


    • Get the system architecture (SPARC or x86).
    • Load the Glance user credentials.
    • Upload the image into the Glance image repository.


    root@global:~# vi image-upload.ksh



    # Upload Unified Archive image to glance with proper Solaris decorations


    arch=$(archiveadm info -p $1|grep ^archive|cut -d '|' -f 4)


    if [[ "$arch" == "i386" ]]; then






    name=$(basename $1 .uar)

    export OS_USERNAME=glance

    export OS_PASSWORD=glance

    export OS_TENANT_NAME=service

    export OS_AUTH_URL=http://localhost:5000/v2.0


    glance \

    image-create \

    --name $name \

    --container-format bare \

    --disk-format raw \

    --owner service \

    --file "$1" \

    --is-public True \

    --property architecture="$imgarch" \

    --property hypervisor_type=solariszones \

    --property vm_mode=solariszones \



    Change the script's permissions and upload the UAR into Glance by running the following commands:




    Log In to Horizon


    Within the host environment, open up a browser and navigate to the IP address you allocated to the global zone:




    Use hadoop/secrete as the user/password combination in the login screen.



    Figure 3. The OpenStack Horizon login screen


    After you have successfully logged in, navigate to the Access & Security screen, where you can create a new SSH keypair:



    Figure 4. Access & Security screen


    There are no keypairs currently defined, so click the Import Key Pair button to open the Import Keypair screen, which is shown in Figure 4.


    In our case, let's use the SSH public key of our global zone.


    First, run the following command to generate the SSH key. Enter yes at the command prompt.


    root@global:~# ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

    Generating public/private rsa key pair.

    /root/.ssh/id_rsa already exists.

    Overwrite (yes/no)? yes


    Next, get the key using the following command, and then enter the key into the Public Key field of the Import Key Pair screen (see Figure 5).


    root@global:~# cat .ssh/

    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0Khp4Th5VcKQW4LttqzKAR8O60gj43cB0CbdpiizEhXEbVgjI7IlnZlo9i




    uK9JUi/gPhg2lTOhISgJoelorQ== root@global


    In the Key Pair Name field, enter hadoopkey.




    Figure 5. Import Keypair screen


    Launch the name-node1 Instance


    Nova is the compute service in OpenStack, and it is responsible for scheduling and deploying new instances.


    Navigate to the Instances screen.




    Figure 6. Instances screen


    Let's launch a new instance by clicking the Launch Instance button.


    We will call our instance name-node1. We will give it an Oracle Solaris non-global zone flavor called tiny. A flavor represents the size of the resources that we should give this instance. We can see in Figure 7 that we will get a root disk of 10 GB and 2,048 MB of RAM. We will choose to boot this instance from the image we uploaded in the previous section that's stored in Glance, which is called myzone (1.4GB).




    Figure 7. Launch Instance screen


    When we are happy with the Details tab, we can move onto the Access & Security tab. There, you can see that our keypair (hadoopkey) has been preselected.




    Figure 8. Access & Security tab


    You can move on to the Networking tab. There, you can see that our network, hadoop_net, has been preselected as our network. Then click the Launch button.




    Figure 9. Networking tab


    After a little bit of time, we can see that our instance has successfully booted, which is indicated by its Active status (see Figure 10). You can see that the instance has the IP address




    Figure 10. Screen showing the instance's status is "active"


    Install Hadoop and Create More Scripts


    From the global zone, get the zone name using the zoneadm command:


    root@global:~# zoneadm list




    Note: Your zone name might be different.


    Log in to the VM using the zlogin command:


    root@global:~# zlogin instance-00000001

    [Connected to zone 'instance-00000044' pts/2]

    Oracle Corporation      SunOS 5.11      11.2    December 2014


    Verify that you have the Hadoop tarball in /var/tmp:


    root@name-node1:~# ls /var/tmp



    Next, set up the Hadoop environment using the script provided in the section "Appendix—OpenStack Configuration Script." (For a full description of the script, see the article "How to Set Up a Hadoop 2.2 Cluster From the Unified Archive.")


    First, create the /usr/local/Scripts directory; we will use this directory for our scripts.


    root@name-node1:~# mkdir -p /usr/local/Scripts


    Then, copy the script content from the Appendix to create the script, and then set the permissions:


    root@name-node1:~# vi /usr/local/Scripts/

    root@name-node1:~# chmod +x /usr/local/Scripts/


    Run the script. The script will prompt you for the passwords of the following users: hdfs, yarn, mapred, and bob.


    root@name-node1:~# /usr/local/Scripts/

    80 blocks

    Enter the password for the hdfs user

    New Password:

    Re-enter new Password:



    Next, create the testssh script. We will use this script to verify the SSH setup.


    root@name-node1:~# vi /usr/local/Scripts/testssh




    for zone in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3



      ssh -o StrictHostKeyChecking=no $zone exit




    Create the startcluster script. We will use this script to start all the services on the Hadoop cluster.


    root@name-node1:~# vi /usr/local/Scripts/startcluster




    su - hdfs -c ""

    su - yarn -c ""

    su - yarn -c 'ssh yarn@resource-manager /usr/local/hadoop/sbin/ start resourcemanager'

    su - mapred -c 'ssh mapred@resource-manager /usr/local/hadoop/sbin/ start historyserver'


    Create the stopcluster script. We will use this script to stop all the services on the Hadoop cluster.


    root@name-node1:~# vi /usr/local/Scripts/stopcluster




    su - hdfs -c ""

    su - yarn -c ""

    su - yarn -c 'ssh yarn@resource-manager /usr/local/hadoop/sbin/ stop resourcemanager'

    su - mapred -c 'ssh mapred@resource-manager /usr/local/hadoop/sbin/ stop historyserver'


    Create the verify-hadoop script, which will verify that the Hadoop processes are up and running.


    root@name-node1:~# vi /usr/local/Scripts/verify-hadoop



    su - hdfs -c "ssh -q hdfs@name-node1 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep NameNode'" > /tmp/hadoop_output

    su - hdfs -c "ssh hdfs@name-node2 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep SecondaryNameNode'" >> /tmp/hadoop_output

    su - hdfs -c "ssh hdfs@data-node1 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep DataNode'" >> /tmp/hadoop_output

    su - hdfs -c "ssh hdfs@data-node2 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep DataNode'" >> /tmp/hadoop_output

    su - hdfs -c "ssh hdfs@data-node3 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep DataNode'" >> /tmp/hadoop_output

    su - yarn -c "ssh yarn@data-node1 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep NodeManager'"  >> /tmp/hadoop_output

    su - yarn -c "ssh yarn@data-node2 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep NodeManager'"  >> /tmp/hadoop_output

    su - yarn -c "ssh yarn@data-node3 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep NodeManager'"  >> /tmp/hadoop_output

    su - yarn -c "ssh yarn@resource-manager 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep ResourceManager'" >> /tmp/hadoop_output

    su - mapred -c "ssh mapred@resource-manager 'hostname ; /usr/jdk/instances/jdk1.7.0/bin/jps | grep HistoryServer'"  >> /tmp/hadoop_output

    cat /tmp/hadoop_output | grep -v Oracle | awk  '{print $1,$2}'


    rm /tmp/hadoop_output


    Change the scripts' permissions:


    root@name-node1:~# chmod -R +x /usr/local/Scripts


    Log out from the zone:


    root@name-node1:~# logout

    [Connection to zone 'instance-00000001' pts/3 closed]


    Once the Hadoop name-node1 instance is ready with the Hadoop configuration, we can create a snapshot in order to create a Glance image from it. We will use the Glance image in order to provision the other Hadoop nodes.


    From the Instances menu, select Create Snapshot.




    Figure 11. Instances menu screen


    The Create Snapshot window will appear.




    Figure 12. The Create Snapshot window


    In the Snapshot Name field, enter hadoop-base-image, and then click the Create Snapshot button.


    After few seconds, the Images screen will open, as shown in Figure 13.




    Figure 13. The Images window


    Wait a few minutes until the image is created. Once the image is ready, you will see that its status has been changed to Active, as shown in Figure 14.




    Figure 14. Screen showing the image's status is active


    Launch name-node2, resource-manager, and the Three DataNode Instances


    You can launch the instances using the Horizon dashboard; alternatively, you can use Heat in order to automate the Hadoop cluster deployment.


    Heat is the main project in the OpenStack Orchestration program. It implements an orchestration engine to launch multiple composite cloud applications based on templates in the form of text files that can be treated like code. For more information about heat, see


    In Heat terminology, a stack is the collection of objects—or resources—that will be created by Heat. This collection might include instances (VMs), networks, subnets, routers, ports, and so on. Heat uses the notion of a template to define a stack.


    In order to automate our Hadoop cluster deployment, we can create a Heat template that will include the instances' information, such as host name, image name, flavor name, key name, and network ID. But first, we need to get this information.


    Load the hadoop user credentials:


    root@global:~# source


    Get the Glance images' names:




    We can see the two Glance images that we have; we will use the hadoop-base-image image.


    Get the flavors' names:




    Table 2 shows which flavor we are going to use for each instance.



    FunctionZone NameFlavor
    NameNodename-node1Oracle Solaris non-global zone - tiny
    Secondary NameNodename-node2Oracle Solaris non-global zone - tiny
    ResourceManagerresource-managerOracle Solaris non-global zone - small
    DataNodedata-node1Oracle Solaris non-global zone - small
    DataNodedata-node2Oracle Solaris non-global zone - small
    DataNodedata-node3Oracle Solaris non-global zone - small


    Note: We will use the flavor Oracle Solaris non-global zone - small for the ResourceManager and DataNodes, because they need bigger storage capacity for HDFS. This flavor gives us the following resources: a 20 GB root disk, 3 GB of RAM, and four virtual CPUs.


    Get the network list:




    We will use the hadoop_net ID.


    Now, let's create the Heat template based on the information we have gathered. For each instance, we need to define the following properties.


    • name: host name
    • image: image name
    • flavor: flavor name
    • key_name: key name
    • networks:

      + UUID: network_id


      + Fixed IP address: fixed_ips



    Edit the Heat template:


    Note: You need to change the network_id values in the template using the hadoop_net ID value that we got from the earlier command.


    root@global:~# vi hadoop-stack.yml

    heat_template_version: 2013-05-23

    description: Hadoop cluster Template



        type: OS::Neutron::Port


        network_id: "e4a00424-70fe-4b1e-851b-dc53fba0f13d"

        fixed_ips: [ { 'ip_address': '' } ]



        type: OS::Nova::Server


          name: "name-node2"

          image: "hadoop-base-image"

          flavor: "Oracle Solaris non-global zone - tiny"

          key_name: "hadoopkey"


          - port: { get_resource: name-node2_server_port }


        type: OS::Neutron::Port


        network_id: "e4a00424-70fe-4b1e-851b-dc53fba0f13d"

        fixed_ips: [ { 'ip_address': '' } ]



        type: OS::Nova::Server


          name: "resource-manager"

          image: "hadoop-base-image"

          flavor: "Oracle Solaris non-global zone - small"

          key_name: "hadoopkey"


          - port: { get_resource: resource-manager_server_port }



        type: OS::Neutron::Port


        network_id: "e4a00424-70fe-4b1e-851b-dc53fba0f13d"

        fixed_ips: [ { 'ip_address': '' } ]


        type: OS::Nova::Server


          name: "data-node1"

          image: "hadoop-base-image"

          flavor: "Oracle Solaris non-global zone - small"

          key_name: "hadoopkey"


          - port: { get_resource: data-node1_server_port }



        type: OS::Neutron::Port


        network_id: "e4a00424-70fe-4b1e-851b-dc53fba0f13d"

        fixed_ips: [ { 'ip_address': '' } ]



        type: OS::Nova::Server


          name: "data-node2"

          image: "hadoop-base-image"

          flavor: "Oracle Solaris non-global zone - small"

          key_name: "hadoopkey"


          - port: { get_resource: data-node2_server_port }



        type: OS::Neutron::Port


        network_id: "e4a00424-70fe-4b1e-851b-dc53fba0f13d"

        fixed_ips: [ { 'ip_address': '' } ]



        type: OS::Nova::Server


          name: "data-node3"

          image: "hadoop-base-image"

          flavor: "Oracle Solaris non-global zone - small"

          key_name: "hadoopkey"


          - port: { get_resource: data-node3_server_port }



    You can validate the Hadoop Heat template syntax using the following command:


    root@global:~# heat template-validate --template-file hadoop-stack.yml


      "Description": "Hadoop cluster Template",

      "Parameters": {}



    Use the following heat stack-create command to create a stack from the template. The command will launch the instances that we defined in the template.




    Use the heat stack-list command to verify successful creation of the stack:




    You can see graphical representation of the Heat stack by navigating to the Orchestration menu, selecting the Stacks menu option, and then choosing HadoopStack to get the following screen:




    Figure 15. The graphical representation of the Heat stack


    Once the instances finish the boot process, you will have six instances, as shown in Figure 16.




    Figure 16. The Instances window


    Check the Network Topology


    The OpenStack Dashboard can show the Hadoop cluster network topology.


    Navigate to the Network Topology screen and then click Normal.




    Figure 17. The Network topology


    You can see the network name, hadoop_net, and its address in addition to the IP address for each instance.


    Verify the SSH Setup


    On each zone, we need to add the Hadoop node names to /etc/hosts.


    Create a temporary file with the host names.


    root@global:~# vi /tmp/hosts

    ::1 localhost localhost loghost name-node1 name-node2 resource-manager data-node1 data-node2 data-node3


    For each zone, copy the host names into /etc/hosts using the following command:


    root@global:~# for zone in `zoneadm list | grep -v global`; do echo \

    $zone ; cat /tmp/hosts | zlogin $zone 'cat -  > /etc/hosts' ; done








    Log in to the name-node1 zone:


    root@global:~# zlogin instance-00000001


    Run the testssh script to log in to the cluster nodes using the ssh command:


    root@name-node1:~# su - hdfs -c "/usr/local/Scripts/testssh"

    Warning: Permanently added 'name-node1' (RSA) to the list of known hosts.


    root@name-node1:~# su - yarn -c "/usr/local/Scripts/testssh"

    root@name-node1:~# su - mapred -c "/usr/local/Scripts/testssh"


    Format HDFS


    Before starting the Hadoop cluster, we need to format HDFS.


    To format HDFS, switch to user hdfs and then run the hdfs namenode -format command:


    root@name-node1:~# su - hdfs


    hdfs@name-node:$ hdfs namenode -format


    Look for the following output, which indicates HDFS has been set up:


    ... INFO common.Storage: Storage directory /var/data/1/dfs/nn has been successfully formatted ....


    Start the HDFS Services


    Run the following script to start the HDFS services:




    Note: You might get the warning message WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. This message indicates Hadoop is unable to use native platform libraries that accelerate the Hadoop suite. These native libraries are optional; the port of the Oracle Solaris Hadoop 2.x native libraries is a work in progress.


    Create a /tmp directory and set its permissions to 1777 (drwxrwxrwt). Then create the file system using the hadoop fs command:


    hdfs@name-node1:~$ hadoop fs -mkdir /tmp

    hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /tmp


    Create a history directory and set permissions and ownership:


    hdfs@name-node1:~$ hadoop fs -mkdir /user

    hdfs@name-node1:~$ hadoop fs -mkdir /user/history

    hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /user/history

    hdfs@name-node1:~$ hadoop fs -chown yarn /user/history


    Create the log directories:


    hdfs@name-node1:~$ hadoop fs -mkdir /var

    hdfs@name-node1:~$ hadoop fs -mkdir /var/log

    hdfs@name-node1:~$ hadoop fs -mkdir /var/log/hadoop-yarn

    hdfs@name-node1:~$ hadoop fs -chown yarn:mapred /var/log/hadoop-yarn


    Create a directory for user bob and set ownership:


    hdfs@name-node1:~$ hadoop fs -mkdir /user/bob

    hdfs@name-node1:~$ hadoop fs -chown bob /user/bob


    Verify the HDFS directory structure:


    hdfs@name-node:~$ hadoop fs -ls -R /

    drwxrwxrwt  - hdfs supergroup        0 2014-02-26 10:43 /tmp

    drwxr-xr-x  - hdfs supergroup        0 2014-02-26 10:58 /user

    drwxr-xr-x  - bob  supergroup        0 2014-02-26 10:58 /user/bob

    drwxrwxrwt  - yarn supergroup        0 2014-02-26 10:50 /user/history

    drwxr-xr-x  - hdfs supergroup        0 2014-02-26 10:53 /var

    drwxr-xr-x  - hdfs supergroup        0 2014-02-26 10:53 /var/log

    drwxr-xr-x  - yarn mapred            0 2014-02-26 10:53 /var/log/hadoop-yarn


    Run the following script in order to stop the HDFS services:




    Log out from the hdfs user.


    hdfs@name-node:$ logout


    Start the Hadoop Cluster


    Run the following script in order to start the Hadoop cluster:


    root@name-node1:~# /usr/local/Scripts/startcluster

    Starting namenodes on [name-node1]

    name-node1: starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-name-node1.out

    data-node1: starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-data-node1.out

    data-node3: starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-data-node3.out

    data-node2: starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-data-node2.out



    Verify that the Hadoop cluster successfully started by using the following script.  The script will print the zone name and the Hadoop process ID and process name.


    [root@name-node1 ~]# /usr/local/Scripts/verify-hadoop


    11730 NameNode


    11990 SecondaryNameNode


    11858 DataNode


    11838 DataNode


    11841 DataNode


    12196 NodeManager


    9679 NodeManager


    12266 NodeManager


    12335 ResourceManager


    12387 JobHistoryServer


    Use the following commands to switch to user hdfs and show the cluster topology:


    root@name-node1:$ su - hdfs

    hdfs@name-node1:~$ hdfs dfsadmin -printTopology


    13/11/26 05:19:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...

    using builtin-java classes where applicable

    Rack: /default-rack (data-node1) (data-node2) (data-node3)


    Run a MapReduce Job


    Switch to user bob:


    root@name-node1:~# su - bob


    Password: <enter bob password>


    Next, run a simple MapReduce job.


    The MapReduce example program used here is included in the Hadoop distribution. It is a straightforward estimation of the value of Pi using a quasi-Monte Carlo method.


    root@name-node1:~# hadoop jar \

    /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar \

    pi 10 20




    • hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi specifies the Hadoop .jar file.
    • 10 specifies the number of maps.
    • 20 specifies the number of samples.


    Note: If you need connectivity to the outside world to import data into the Hadoop cluster, you can define external network connectivity. Refer to Installing and Configuring OpenStack in Oracle Solaris 11.2 for more information.




    In this article, we saw how we can leverage Oracle Solaris technologies such as Oracle OpenStack for Oracle Solaris, Oracle Solaris Zones, and the Unified Archive feature of Oracle Solaris 11.2 to build a multinode Hadoop 2.6 cluster. Notice that the cluster in this example is a virtual cluster utilizing only one physical machine. Virtual clusters allow for efficient vertical scaling, which in turn promotes optimized resource utilization.




    The authors would like to thank Girish Moodalbail, Debabrata Sarkar, and Glynn Foster for their contributions to this article.


    See Also


    See the OpenStack on Oracle Solaris Technology Spotlight web page.


    Also see these additional resources:



    Also see these additional publications by Orgad Kimchi:



    About the Authors


    Ekine Akuiyibo is software engineer in the DPA Technology Office at Oracle where he works on big data and cloud computing technologies. His current focus is investigating algorithms and implementations for machine learning at scale and optimized resource allocation in cloud computing.


    Orgad Kimchi is a principal software engineer on the ISV Engineering team at Oracle. For seven years he has specialized in virtualization, big data, and cloud computing technologies.


    Appendix—OpenStack Configuration Script



    beadm create before_hadoop_setup

    export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

    groupadd -g 200 hadoop

    useradd -u 100 -m -g hadoop hdfs

    echo "Enter the password for the hdfs user"

    passwd hdfs

    useradd -u 101 -m -g hadoop yarn

    echo "Enter the password for the yarn user"

    passwd yarn

    useradd -u 102 -m -g hadoop mapred

    echo "Enter the password for the mapred user"

    passwd mapred

    useradd -m -u 1000 bob

    echo "Enter the password for the bob user"

    passwd bob

    cp /var/tmp/hadoop-2.6.0.tar.gz /usr/local

    (cd /usr/local ; tar -xfz /usr/local/hadoop-2.6.0.tar.gz)

    ln -s /usr/local/hadoop-2.6.0 /usr/local/hadoop

    chown -R root:hadoop /usr/local/hadoop-2.6.0

    chmod -R 755 /usr/local/hadoop-2.6.0


    echo "export JAVA_HOME=/usr/java" >> $HADOOP_CONF_DIR/

    echo "export HADOOP_LOG_DIR=/var/log/hadoop/hdfs" >> $HADOOP_CONF_DIR/

    cat << EOF > $HADOOP_CONF_DIR/

    export JAVA_HOME=/usr/java

    export YARN_LOG_DIR=/var/log/hadoop/yarn

    export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop

    export HADOOP_HOME=/usr/local/hadoop

    export HADOOP_MAPRED_HOME=/usr/local/hadoop

    export HADOOP_COMMON_HOME=/usr/local/hadoop

    export HADOOP_HDFS_HOME=/usr/local/hadoop

    export YARN_HOME=/usr/local/hadoop

    export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

    export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop



    cat << EOF > $HADOOP_CONF_DIR/








    cat << EOF > $HADOOP_CONF_DIR/slaves






    cat << EOF > $HADOOP_CONF_DIR/core-site.xml

    <?xml version="1.0"?>

    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->









    cat << EOF > $HADOOP_CONF_DIR/hdfs-site.xml

    <?xml version="1.0"?>

    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

























    cat << EOF > $HADOOP_CONF_DIR/mapred-site.xml

    <?xml version="1.0"?>






















    cat << EOF > $HADOOP_CONF_DIR/yarn-site.xml

    <?xml version="1.0"?>


    <!-- Site specific YARN configuration properties -->


























        <description>Where to aggregate logs</description>







    pkg install --accept jdk-7

    pkg install  pkg://solaris/network/ssh

    mkdir -p /var/log/hadoop/yarn

    chown yarn:hadoop /var/log/hadoop/yarn

    mkdir -p /var/log/hadoop/hdfs

    chown hdfs:hadoop /var/log/hadoop/hdfs

    mkdir -p /var/log/hadoop/mapred

    chown mapred:hadoop /var/log/hadoop/mapred

    mkdir -p /var/data/1/dfs/nn

    chmod 700 /var/data/1/dfs/nn

    chown -R hdfs:hadoop /var/data/1/dfs/nn

    mkdir -p /var/data/1/dfs/dn

    chown -R hdfs:hadoop /var/data/1/dfs/dn

    mkdir -p /var/data/1/yarn/local

    mkdir -p /var/data/1/yarn/logs

    chown -R yarn:hadoop /var/data/1/yarn/local

    chown -R yarn:hadoop /var/data/1/yarn/logs

    mkdir -p /var/hadoop/run/yarn

    chown yarn:hadoop /var/hadoop/run/yarn

    mkdir -p /var/hadoop/run/hdfs

    chown hdfs:hadoop /var/hadoop/run/hdfs

    mkdir -p /var/hadoop/run/mapred

    chown mapred:hadoop /var/hadoop/run/mapred


    cat << EOF >> /etc/profile

    # Set JAVA_HOME

    export JAVA_HOME=/usr/java

    # Add Hadoop bin/ directory to PATH

    export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

    export HADOOP_HOME=/usr/local/hadoop

    export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop



    echo 'export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin' >> /export/home/hdfs/.profile

    echo 'export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin' >> /export/home/yarn/.profile

    echo 'export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin' >> /export/home/mapred/.profile

    echo 'export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin' >> /export/home/bob/.profile


    cat << EOF >> /etc/hosts name-node1 name-node2 resource-manager data-node1 data-node2 data-node3


    su - hdfs -c 'ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa'

    su - hdfs -c 'cat ~/.ssh/ >> ~/.ssh/authorized_keys'

    su - yarn -c 'ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa'

    su - yarn -c 'cat ~/.ssh/ >> ~/.ssh/authorized_keys'


    su - mapred -c 'ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa'

    su - mapred -c 'cat ~/.ssh/ >> ~/.ssh/authorized_keys'



    Revision 1.0, 07/07/2015


    Follow us:
    Blog | Facebook | Twitter | YouTube