Forum Stats

  • 3,838,606 Users
  • 2,262,384 Discussions
  • 7,900,692 Comments

Discussions

How to Build a Cassandra Multinode Database Cluster on Oracle Solaris 11.3 with LUN Mirroring and IP

Melvis-Oracle
Melvis-Oracle Posts: 61 Employee
edited May 24, 2018 8:43PM in Oracle Solaris

by Antonis Tsavdaris

This article describes how to build a Cassandra single-rack database cluster on Oracle Solaris 11.3 and extend its overall availability with LUN mirroring and IP network multipathing.

Cassandra database is a popular distributed database management system from Apache Foundation. It is highly scalable and comes with a master-less notion, in that there isn't a primary node to which other nodes are subservient. Every node in the cluster is equal and any node can service any request.

Oracle Solaris 11 is an enterprise-class operating system known for its reliability, availability, and serviceability (RAS) features. Its wealth of integrated features helps administrators build redundancy into every part of the system they deem critical, including the network, storage, and so on.

This how-to article describes how to build a Cassandra single-rack database cluster on Oracle Solaris 11.3 and extend its overall availability with LUN mirroring and IP network multipathing (IPMP). LUN mirroring will provide extended availability at the storage level and IPMP will add redundancy to the network.

In this scenario, the one-rack cluster is composed of six Oracle Solaris server instances. Three of them—dbnode1, dbnode2, and dbnode3—will be the database nodes and the other three—stgnode1, stgnode2, and stgnode3—will provide highly available storage. The highly available storage will be constructed from nine LUNs, three in each storage node.

At the end of the construction, the one-rack cluster will have a fully operational database even if two of the storage nodes are not available. Furthermore, the networks—the public network and the iSCSI network—will be immune to hardware failures through IPMP groups consisting of an active and a standby network card.

Cluster Topology

All servers have the Oracle Solaris 11.3 operating system installed. Table 1 depicts the cluster architecture.

In reality the Cassandra binaries as well as the data will reside on the storage nodes. The database nodes will serve the running instances.

Table 1. Oracle Solaris servers and their role in the cluster.

Node NameRole in the ClusterContains
dbnode1Database nodeRunning instance
dbnode2Database nodeRunning instance
dbnode3Database nodeRunning instance
stgnode1Storage nodeBinaries and data
stgnode2Storage nodeBinaries and data
stgnode3Storage nodeBinaries and data

Network Interface Cards

As shown in Table 2, every server in the cluster has four network interface cards (NICs) installed; net0 through net3 will be named. Redundancy is required at the network level and this will be provided by IPMP groups. IP multipathing requires that the DefaultFixed network profile be activated and static IP addresses be assigned to every network interface.

Table 2. NICs and IPMP group configuration.

                                                                                                             

Node NameNICPrimary/Standby NICIP/SubnetIPMP Group NameIPMP IP AddressRole
dbnode1net0primary192.168.2.10/24IPMP0192.168.2.22/24Public network
net1standby192.168.2.11/24
net2primary10.0.1.1/27IPMP110.0.1.13/27iSCSI initiator
net3standby10.0.1.2/27
dbnode2net0primary192.168.2.12/24IPMP2192.168.2.23/24Public network
net1standby192.168.2.13/24
net2primary10.0.1.3/27IPMP310.0.1.14/27iSCSI initiator
net3standby10.0.1.4/27
dbnode3net0primary192.168.2.14/24IPMP4192.168.2.24/24Public network
net1standby192.168.2.15/24
net2primary10.0.1.5/27IPMP510.0.1.15/27iSCSI initiator
net3standby10.0.1.6/27
stgnode1net0primary192.168.2.16/24IPMP6192.168.2.25/24Public network
net1standby192.168.2.17/24
net2primary10.0.1.7/27IPMP710.0.1.16/27iSCSI target
net3standby10.0.1.8/27
stgnode2net0primary192.168.2.18/24IPMP8192.168.2.26/24Public network
net1standby192.168.2.19/24
net2primary10.0.1.9/27IPMP910.0.1.17/27iSCSI target
net3standby10.0.1.10/27
stgnode3net0primary192.168.2.20/24IPMP10192.168.2.27/24Public network
net1standby192.168.2.21/24
net2primary10.0.1.11/27IPMP1110.0.1.18/27iSCSI target
net3standby10.0.1.12/27

First, ensure that the network service is up and running. Then check whether the network profile is set to DefaultFixed.

[email protected]:~# svcs network/physical

STATE          STIME    FMRI

online         1:25:45  svc:/network/physical:upgrade

online         1:25:51  svc:/network/physical:default

[email protected]:~# netadm list

TYPE        PROFILE        STATE

ncp         Automatic      disabled

ncp         DefaultFixed   online

loc         DefaultFixed   online

loc         Automatic      offline

loc         NoNet          offline

Because the network profile is set to DefaultFixed, review the network interfaces and the data link layer.

[email protected]:~# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE net0              Ethernet             unknown    1000   full      e1000g0 net1              Ethernet             unknown    1000   full      e1000g1 net3              Ethernet             unknown    1000   full      e1000g3 net2              Ethernet             unknown    1000   full      e1000g2

Create the IP interface for net0 and then configure a static IPv4 address.

[email protected]:~# ipadm create-ip net0

[email protected]:~# ipadm create-addr -T static -a 192.168.2.10/24 net0/v4

[email protected]:~# ipadm show-addr

ADDROBJ        TYPE     STATE      ADDR

lo0/v4         static   ok         127.0.0.1/8

net0/v4        static   ok         192.168.2.10/24

lo0/v6         static   ok         ::1/128

Following this, create the IP interfaces and assign the relevant IP addresses and subnets for each of the NICs, net0–net3, for each of the servers according to Table 2.

Note: There is an exceptional article by Andrew Walton on how to configure an Oracle Solaris network along with making it internet-facing: "How to Get Started Configuring Your Network in Oracle Solaris 11."

IPMP Groups

After the NICs have been configured and the IP addresses have been assigned, IPMP groups can be configured as well. IPMP is a great way to group separate physical network interfaces and, thus, provide physical interface failure detection, network access failover, and network load spreading. IPMP groups will be made of two NICs in an active/standby configuration. So, when an interface that is a member of an IPMP group is brought down for maintenance or when a NIC fails due to a mechanical error, a failover process will take place; the remaining NIC and related IP interface will step in to ensure that the node is not segregated from the cluster.

According to the planned scenario, two IPMP groups are going to be created in each server, one for every two NICs configured earlier. Each IPMP group will have its own IP interface, and one of the underlying NICs will be active, while the other will remain a standby. Table 2 summarizes the IPMP group configurations that must be completed on each node.

First, create the IPMP group IPMP0. Then, bind interfaces net0 and net1 to this group and create an IP address for the group.

[email protected]:~# ipadm create-ipmp ipmp0
[email protected]:~# ipadm add-ipmp -i net0 -i net1 ipmp0
[email protected]:~# ipadm create-addr -T static -a 192.168.2.22/24 ipmp0
ipmp0/v4

Now that IPMP0 has been created successfully, declare net1 as the standby interface.

[email protected]:~# ipadm set-ifprop -p standby=on -m ip net1
[email protected]:~# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES ipmp0       ipmp0       ok        10.00s    net0 (net1)

The ipmpstat command reports that the IPMP0 group has been built successfully and that it operates over two NICs, net0 and net1. The parentheses denote a standby interface.

Follow the above-mentioned approach to build the IPMP groups for the rest of the servers in the cluster, as shown in Table 2.

Local Storage

As shown in Table 3, each of the storage servers has nine 10 GB additional disks upon which zpools are to be created. They are to be built with a RAID 1 and hot-spare configuration. Following this, ZFS file systems and LUNs can be constructed.

Table 3. Additional disk storage configuration.

                                                                                                       

Node NameZFS Pool NameDisk NameSizeRole in MirrorZFS File System
stgnode1zpool1c1t2d010 GBmemberzfslun1
c1t3d010 GBmember
c1t4d010 GBspare
zpool2c1t5d010 GBmemberzfslun2
c1t6d010 GBmember
c1t7d010 GBspare
zpool3c1t8d010 GBmemberzfslun3
c1t9d010 GBmember
c1t10d010 GBspare
stgnode2zpool4c1t2d010 GBmemberzfslun4
c1t3d010 GBmember
c1t4d010 GBspare
zpool5c1t5d010 GBmemberzfslun5
c1t6d010 GBmember
c1t7d010 GBspare
zpool6c1t8d010 GBmemberzfslun6
c1t9d010 GBmember
c1t10d010 GBspare
stgnode3zpool7c1t2d010 GBmemberzfslun7
c1t3d010 GBmember
c1t4d010 GBspare
zpool8c1t5d010 GBmemberzfslun8
c1t6d010 GBmember
c1t7d010 GBspare
zpool9c1t8d010 GBmemberzfslun9
c1t9d010 GBmember
c1t10d010 GBspare

Starting with stgnode1, run the format command, which reports the additional, unconfigured disks.

[email protected]:~# format
Searching for disks...done  AVAILABLE DISK SELECTIONS:        0. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        1. c1t2d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        2. c1t3d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        3. c1t4d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        4. c1t5d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        5. c1t6d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        6. c1t7d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        7. c1t8d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        8. c1t9d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0        9. c1t10d0 <ATA-VBOX HARDDISK-1.0 cyl 1303 alt 2 hd 255 sec 63>           /[email protected],0/pci8086,[email protected]/[email protected],0 Specify disk (enter its number): ^C
[email protected]:~#

Create the zpools zpool1, zpool2, and zpool3 in a RAID 1 with hot-spare configuration.

[email protected]:~# zpool create zpool1 mirror c1t2d0 c1t3d0 spare c1t4d0

[email protected]:~# zpool status zpool1

  pool: zpool1

state: ONLINE

  scan: none requested

config:

    NAME        STATE     READ WRITE CKSUM

    zpool1      ONLINE       0     0     0

      mirror-0  ONLINE       0     0     0

        c1t2d0  ONLINE       0     0     0

        c1t3d0  ONLINE       0     0     0

    spares

      c1t4d0    AVAIL  

errors: No known data errors

[email protected]:~# zpool create zpool2 mirror c1t5d0 c1t6d0 spare c1t7d0

[email protected]:~# zpool status zpool2

  pool: zpool2

state: ONLINE

  scan: none requested

config:

    NAME        STATE     READ WRITE CKSUM

    zpool2      ONLINE       0     0     0

      mirror-0  ONLINE       0     0     0

        c1t5d0  ONLINE       0     0     0

        c1t6d0  ONLINE       0     0     0

    spares

      c1t7d0    AVAIL  

errors: No known data errors

[email protected]:~# zpool create zpool3 mirror c1t8d0 c1t9d0 spare c1t10d0

[email protected]:~# zpool status zpool3

  pool: zpool3

state: ONLINE

  scan: none requested

config:

    NAME        STATE     READ WRITE CKSUM

    zpool3      ONLINE       0     0     0

      mirror-0  ONLINE       0     0     0

        c1t8d0  ONLINE       0     0     0

        c1t9d0  ONLINE       0     0     0

    spares

      c1t10d0   AVAIL  

errors: No known data errors

Running the format command again shows that the disks have been formatted.

[email protected]:~# format
Searching for disks...done  AVAILABLE DISK SELECTIONS:        0. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        1. c1t2d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        2. c1t3d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        3. c1t4d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        4. c1t5d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        5. c1t6d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        6. c1t7d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        7. c1t8d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        8. c1t9d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0        9. c1t10d0 <ATA-VBOX HARDDISK-1.0-10.00GB>           /[email protected],0/pci8086,[email protected]/[email protected],0 Specify disk (enter its number): ^C

Use the zpool list command to get a report on the newly created ZFS pools.

[email protected]:~# zpool list

NAME        SIZE     ALLOC    FREE     CAP    DEDUP    HEALTH  ALTROOT

rpool      19.6G     8.01G    11.6G    40%     1.00x   ONLINE  -

zpool1     9.94G       88K    9.94G     0%     1.00x   ONLINE  -

zpool2     9.94G       88K    9.94G     0%     1.00x   ONLINE  -

zpool3     9.94G       88K    9.94G     0%     1.00x   ONLINE  -

Build ZFS file systems on the ZFS pools.

[email protected]:~# zfs create -V 8g zpool1/zfslun1
[email protected]:~# zfs create -V 8g zpool2/zfslun2
[email protected]:~# zfs create -V 8g zpool3/zfslun3

Use the zfs list command to get a report on the newly created ZFS file systems.

[email protected]:~# zfs list -r /zpool*

NAME             USED  AVAIL  REFER  MOUNTPOINT

zpool1          8.25G  1.53G    31K  /zpool1

zpool1/zfslun1  8.25G  9.78G    16K  -

zpool2          8.25G  1.53G    31K  /zpool2

zpool2/zfslun2  8.25G  9.78G    16K  -

zpool3          8.25G  1.53G    31K  /zpool3

zpool3/zfslun3  8.25G  9.78G    16K  -

Perform the same work on the second and third storage nodes.

iSCSI Targets

As shown in Table 4, three pools are to be further constructed. They are to be mirrored across the network with a hot-spare configuration. ZFS pool datapool1 will be constructed from host dbnode1 by LUNs c0t600144F03B268F00000055F33BB10001d0, c0t600144F06A174000000055F5D8F50001d0, and c0t600144F0BBB5C300000055F5DB370001d0, each coming from a different storage node.

Similarly, ZFS pool datapool2 will be constructed from host dbnode2 by LUNs c0t600144F03B268F00000055F33BCC0002d0, c0t600144F06A174000000055F5D90D0002d0, and c0t600144F0BBB5C300000055F5DB4D0002d0, each coming from a different storage node.

Finally, pool datapool3 will be constructed from host dbnode3 by LUNs c0t600144F03B268F00000055F33BFE0003d0, c0t600144F06A174000000055F5D9350003d0, and c0t600144F0BBB5C300000055F5DB690003d0.

Table 4. Structure and constituents of the two LUN mirrors.

                   

Cross-Platform ZFS PoolNode NameZFS File SystemLUNZFS File System
datapool1stgnode1zfslun1c0t600144F03B268F00000055F33BB10001d0/datapool1/zfsnode1
stgnode2zfslun4c0t600144F06A174000000055F5D8F50001d0
stgnode3zfslun7c0t600144F0BBB5C300000055F5DB370001d0
datapool2stgnode1zfslun2c0t600144F03B268F00000055F33BCC0002d0/datapool2/zfsnode2
stgnode2zfslun5c0t600144F06A174000000055F5D90D0002d0
stgnode3zfslun8c0t600144F0BBB5C300000055F5DB4D0002d0
datapool3stgnode1zfslun3c0t600144F03B268F00000055F33BFE0003d0/datapool3/zfsnode3
stgnode2zfslun6c0t600144F06A174000000055F5D9350003d0
stgnode3zfslun9c0t600144F0BBB5C300000055F5DB690003d0

In order to be able to create iSCSI targets and LUNs, the storage server group of packages must be installed on each of the storage servers.

[email protected]:~# pkg install storage-server

           Packages to install:  21

            Services to change:   1

       Create boot environment:  No

Create backup boot environment: Yes

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED

Completed                              21/21     3644/3644  111.6/111.6  586k/s

PHASE                                          ITEMS

Installing new actions                     4640/4640

Updating package state database                 Done

Updating package cache                           0/0

Updating image state                            Done

Creating fast lookup database                   Done

Updating package cache                           1/1

Verify that the group of packages has been installed by reviewing the output of the pkg info command, as follows:

[email protected]:~# pkg info storage-server

                      Name: group/feature/storage-server

       Summary: Multi protocol storage server group package

      Category: Drivers/Storage (org.opensolaris.category.2008)

                Meta Packages/Group Packages (org.opensolaris.category.2008)

         State: Installed

     Publisher: solaris

       Version: 0.5.11

Build Release: 5.11

        Branch: 0.175.3.0.0.25.0

Packaging Date: June 21, 2015 10:57:56 PM

          Size: 5.46 kB

          FMRI: pkg://solaris/group/feature/[email protected],5.11-0.175.3.0.0.25.0:20150621T225756Z

Perform the same action on the second and third storage nodes.

Enable the Oracle Solaris Common Multiprotocol SCSI TARget (COMSTAR) SCSI Target Mode Framework (STMF) service and verify that it is online. Then, create logical units for all the ZFS LUNs from the storage nodes on which they were created. Start from stgnode1.

[email protected]:~# svcadm enable stmf
[email protected]:~# svcs stmf
STATE       STIME    FMRI online         22:48:39  svc:/system/stmf:default  [email protected]:~# stmfadm create-lu /dev/zvol/rdsk/zpool1/zfslun1
Logical unit created: 600144F03B268F00000055F33BB10001  [email protected]:~# stmfadm create-lu /dev/zvol/rdsk/zpool2/zfslun2
Logical unit created: 600144F03B268F00000055F33BCC0002  [email protected]:~# stmfadm create-lu /dev/zvol/rdsk/zpool3/zfslun3
Logical unit created: 600144F03B268F00000055F33BFE0003

Confirm that the LUNs have been created successfully.

[email protected]:~#  stmfadm list-lu

LU Name: 600144F03B268F00000055F33BB10001

LU Name: 600144F03B268F00000055F33BCC0002

LU Name: 600144F03B268F00000055F33BFE0003

Create the LUN view for each of the LUNs and verify the LUN configuration.

[email protected]:~# stmfadm add-view 600144F03B268F00000055F33BB10001
[email protected]:~# stmfadm add-view 600144F03B268F00000055F33BCC0002
[email protected]:~# stmfadm add-view 600144F03B268F00000055F33BFE0003
[email protected]:~# stmfadm list-view -l 600144F03B268F00000055F33BB10001
View Entry: 0     Host group   : All     Target Group : All     LUN          : Auto [email protected]:~# stmfadm list-view -l 600144F03B268F00000055F33BCC0002
View Entry: 0     Host group   : All     Target Group : All     LUN          : Auto [email protected]:~# stmfadm list-view -l 600144F03B268F00000055F33BFE0003
View Entry: 0     Host group   : All     Target Group : All     LUN          : Auto

Enable the iSCSI target service on the first storage node and verify it is online.

[email protected]:~# svcadm enable -r svc:/network/iscsi/target:default
[email protected]:~# svcs iscsi/target
STATE        STIME     FMRI online       22:53:44  svc:/network/iscsi/target:default

Create the iSCSI target and list it:

[email protected]:~# itadm create-target
Target iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4 successfully created

Verify that the target has been created.

[email protected]:~# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS  iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4  online   0              alias:              -      auth:               none (defaults)      targetchapuser:     -      targetchapsecret:   unset      tpg-tags:           default

Follow the same steps to create logical units for the rest of the ZFS LUNs and enable the iSCSI target on the second and third storage server.

After the iSCSI targets have been successfully created, the iSCSI initiators must be created on the database nodes.

Enable the iSCSI initiator service.

[email protected]:~# svcadm enable network/iscsi/initiator

Configure the targets to be statically discovered. The initiator will discover targets from all three storage servers.

[email protected]:~# iscsiadm add static-config \
iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4,10.0.1.16
[email protected]:~# iscsiadm add static-config \
iqn.1986-03.com.sun:02:ae65e6de-dfb1-4a77-9940-dabf68709f5d,10.0.1.17
[email protected]:~# iscsiadm add static-config \
iqn.1986-03.com.sun:02:f4e68b9d-26ca-484a-8d85-d2c8275da0eb,10.0.1.18

Verify the configuration with the iscsiadm list command.

[email protected]:~# iscsiadm list static-config
Static Configuration Target: iqn.1986-03.com.sun:02:ae4d3c15-8f1c-4098-9d07-8d2c619516e4,10.0.1.16:3260 Static Configuration Target: iqn.1986-03.com.sun:02:ae65e6de-dfb1-4a77-9940-dabf68709f5d,10.0.1.17:3260 Static Configuration Target: iqn.1986-03.com.sun:02:f4e68b9d-26ca-484a-8d85-d2c8275da0eb,10.0.1.18:3260

Enable the static target discovery method.

[email protected]:~# iscsiadm modify discovery --static enable

Perform the same actions to configure the iSCSI initiator on dbnode2 and dbnode3 and enable the static target discovery method.

LUN Mirroring and Storage

From the first database node (dbnode1) verify the available disks. Nine LUNs should be available.

[email protected]:~# format
Searching for disks...done AVAILABLE DISK SELECTIONS:      0. c0t600144F0BBB5C300000055F5DB4D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]bb5c300000055f5db4d0002      1. c0t600144F0BBB5C300000055F5DB370001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      2. c0t600144F0BBB5C300000055F5DB690003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      3. c0t600144F03B268F00000055F33BB10001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      4. c0t600144F03B268F00000055F33BCC0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      5. c0t600144F03B268F00000055F33BFE0003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      6. c0t600144F06A174000000055F5D8F50001d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      7. c0t600144F06A174000000055F5D90D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      8. c0t600144F06A174000000055F5D9350003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>         /scsi_vhci/[email protected]      9. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>         /[email protected],0/pci8086,[email protected]/[email protected],0 Specify disk (enter its number): ^C

Build the first ZFS pool from LUNs c0t600144F03B268F00000055F33BB10001d0, c0t600144F06A174000000055F5D8F50001d0, and c0t600144F0BBB5C300000055F5DB370001d0. These all come from different storage servers to ensure the storage has high availability.

[email protected]:~# zpool create datapool1 mirror c0t600144F03B268F00000055F33BB10001d0 \
c0t600144F06A174000000055F5D8F50001d0 spare c0t600144F0BBB5C300000055F5DB370001d0

Create the zfsnode1 ZFS file system on the zpool.

[email protected]:~# zfs create datapool1/zfsnode1

[email protected]:~# zpool list

NAME        SIZE    ALLOC   FREE    CAP   DEDUP   HEALTH  ALTROOT

datapool1   7.94G    128K   7.94G    0%   1.00x   ONLINE  -

rpool       19.6G   7.53G   12.1G   38%   1.00x   ONLINE  -

Verify the ZFS creation recursively.

[email protected]:~# zfs list -r datapool1
NAME                 USED  AVAIL  REFER   MOUNTPOINT datapool1            128K  7.81G  32K     /datapool1 datapool1/zfsnode1   31K   7.81G  31K     /datapool1/zfsnode1

From the second database node (dbnode2), execute the format utility to verify the available disks. Check that three of the LUNs have been formatted.

[email protected]:~# format

Searching for disks...done

AVAILABLE DISK SELECTIONS:

     0. c0t600144F0BBB5C300000055F5DB4D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/[email protected]

     1. c0t600144F0BBB5C300000055F5DB370001d0 <SUN-COMSTAR-1.0-8.00GB>

        /scsi_vhci/[email protected]

     2. c0t600144F0BBB5C300000055F5DB690003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/[email protected]

     3. c0t600144F03B268F00000055F33BB10001d0 <SUN-COMSTAR-1.0-8.00GB>

        /scsi_vhci/[email protected]

     4. c0t600144F03B268F00000055F33BCC0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/[email protected]

     5. c0t600144F03B268F00000055F33BFE0003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/[email protected]

     6. c0t600144F06A174000000055F5D8F50001d0 <SUN-COMSTAR-1.0-8.00GB>

        /scsi_vhci/[email protected]

     7. c0t600144F06A174000000055F5D90D0002d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/[email protected]

     8. c0t600144F06A174000000055F5D9350003d0 <SUN-COMSTAR-1.0 cyl 4094 alt 2 hd 128 sec 32>

        /scsi_vhci/[email protected]

     9. c1t0d0 <ATA-VBOX HARDDISK-1.0-20.00GB>

        /[email protected],0/pci8086,[email protected]/[email protected],0

Specify disk (enter its number): ^C

Build the rest of the ZFS pools from the remaining available LUNs, as shown in Table 4.

Database Installation and Configuration

Before we can build Cassandra on the database nodes, Apache Ant must be installed. Apache Ant is a tool for building Java applications. Because Ant requires Java in order to run, Java Development Kit 8 (JDK 8) must be installed also.

Use the pkg utility to install Ant.

[email protected]:~# pkg install ant

                        Packages to install:  1

       Create boot environment: No

Create backup boot environment: No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED

Completed                                1/1     1594/1594      7.6/7.6  216k/s

PHASE                                          ITEMS

Installing new actions                     1617/1617

Updating package state database                 Done

Updating package cache                           0/0

Updating image state                            Done

Creating fast lookup database                   Done

Updating package cache                           1/1

[email protected]:~# pkg info ant

          Name: developer/build/ant

       Summary: Apache Ant

   Description: Apache Ant is a Java-based build tool

      Category: Development/Distribution Tools

         State: Installed

     Publisher: solaris

       Version: 1.9.3

Build Release: 5.11

        Branch: 0.175.3.0.0.25.3

Packaging Date: June 21, 2015 11:51:03 PM

          Size: 35.66 MB

          FMRI: pkg://solaris/developer/build/[email protected],5.11-0.175.3.0.0.25.3:20150621T235103Z

Install the Java Development Kit.

[email protected]:~# pkg install jdk-8

           Packages to install:  2

       Create boot environment: No

Create backup boot environment: No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED

Completed                                2/2       625/625    46.3/46.3  274k/s

PHASE                                          ITEMS

Installing new actions                       735/735

Updating package state database                 Done

Updating package cache                           0/0

Updating image state                            Done

Creating fast lookup database                   Done

Updating package cache                           1/1

Verify that JDK 8 is on the database node.

[email protected]:~# java -version
java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

On all the database nodes, download the source code for Cassandra version 2.1.9 (apache-cassandra-2.1.9-src.tar.gz) from http://cassandra.apache.org/, and install the software as follows.

Unzip the Apache source code and place it into the relevant /datapoolx file system. Create the db_files directory where data and log files are to reside.

[email protected]:~# cd Downloads
[email protected]:~/Downloads# ls apache-cassandra-2.1.9-src.tar.gz [email protected]:~/Downloads# tar -zxvf apache-cassandra-2.1.9-src.tar.gz
[email protected]:~/Downloads# mv apache-cassandra-2.1.9-src cassandra
[email protected]:~/Downloads# ls
apache-cassandra-2.1.9-src.tar.gz  cassandra [email protected]:~/Downloads# mv cassandra /datapool1/zfsnode1
[email protected]:~/Downloads# cd /datapool1/zfsnode1
[email protected]:/datapool1/zfsnode1# mkdir db_files

Make the cassandra directory the current working directory and build the Cassandra application with Ant.

[email protected]:/datapool1/zfsnode1# cd cassandra

[email protected]:/datapool1/zfsnode1/cassandra# ant

...

BUILD SUCCESSFUL

Total time: 8 minutes 37 seconds

The application has been built. Open .profile with a text editor and add the following entries. Then source the file.

export  CASSANDRA_HOME=/datapool1/zfsnode1/cassandra
export  PATH=$CASSANDRA_HOME/bin:$PATH
[email protected]:~/# source .profile

One at a time, move to the /datapool1/zfsnode1/cassandra/bin and /datapool1/zfsnode1/cassandra/tools directories, and use a text editor to open the shell scripts that are shown in Table 5. In the first line of each file, change #!bin/sh to #!bin/bash and then save the file.

Table 5. Shell scripts to change.

Cassandra DirectoryShell Scripts to Change
$CASSANDRA_HOME/bincassandra.sh, Cassandra-cli.sh, cqlsh.sh, debug-cql, nodetool.sh, sstablekeys.sh, sstableloader.sh, sstablescrub.sh, sshtableupgrade.sh
$CASSANDRA_HOME/tools/bincassandra-stress.sh, cassandra-stressd.sh, json2sstable.sh, sstable2json.sh, sstableexpiredblockers.sh, sstablelevelreset.sh, sstablemetadata.sh, sstableofflinerelevel.sh, sstablerepairedset.sh, sstablesplit.sh

In the Cassandra/conf directory, the shell script Cassandra-env.sh utilizes grep with the -A option. This causes Oracle Solaris to throw an illegal-option warning when starting Cassandra or running all other utilities. By default, grep runs under the /usr/bin directory. The warning can be avoided by executing the grep utility under the /usr/gnu/bin directory. To do this, declare its absolute path in Cassandra-env.sh.

[email protected]:~/# which grep
/usr/bin/grep

Open $CASSANDRA_HOME/conf/Cassandra-env.sh and change grep -A to /usr/gnu/bin/grep -A. Then save the file to commit the change.

Move to /datapool1/zfsnode1/cassandra/conf/, open cassandra.yaml with a text editor, and make the following adjustments.

cluster_name: 'MyCluster' num_tokens: 5 data_file_directories:      /datapool1/zfsnode1/db_files/data commitlog_directory: /datapool1/zfsnode1/db_files/commitlog saved_caches_directory: /datapool1/zfsnode1/db_files/saved_caches seed_provider:           - seeds: "192.168.2.22" listen_address: 192.168.2.22 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch

Perform the same steps to build Cassandra on dbnode2 and dbnode3. Place the source code in the relevant ZFS file system. Execute the same modifications as made earlier, too. Configure the cassandra.yaml file for the second and third database nodes as shown below:

The cassandra.yaml configuration for dbnode2:

cluster_name: 'MyCluster' num_tokens: 5 data_file_directories:     - /datapool2/zfsnode2/db_files/data commitlog_directory: /datapool2/zfsnode2/db_files/commitlog saved_caches_directory: /datapool2/zfsnode2/db_files/saved_caches seed_provider:           - seeds: "192.168.2.22" listen_address: 192.168.2.23 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch

The cassandra.yaml configuration for dbnode3:

cluster_name: 'MyCluster' num_tokens: 5 data_file_directories:     - /datapool3/zfsnode3/db_files/data commitlog_directory: /datapool3/zfsnode3/db_files/commitlog saved_caches_directory: /datapool3/zfsnode3/db_files/saved_caches seed_provider:           - seeds: "192.168.2.22" listen_address: 192.168.2.24 rpc_address: localhost rpc_keepalive: true endpoint_snitch: GossipingPropertyFileSnitch

Some Notes About the Cassandra.yaml File

In order for the database servers to belong to the same cluster, they must share the same cluster name. The cluster_name setting fulfills this purpose. Seed servers are one or more database servers that currently belong to the cluster and are to be contacted by a new server when it first joins the cluster. This new server will contact the seed servers for information about the rest of the servers in the cluster, that is, their names, their IP addresses, the racks and data centers they belong to, and so on.

When a cluster is initialized for the first time, a token ring is created. The token ring's values range from -2^63 to 2^63. The num_tokens setting is a number that controls how many tokens are to be created per database server and in that way, a token range is built for the distribution of data. As data is inserted, the primary key (or a part of the primary key) gets hashed. This hash value falls within a token range and is the server where data will be sent. Every server can have a different num_tokens setting based on the server hardware. Better servers can have a larger number of tokens set than older or less powerful servers. The data_file_directories, commitlog_directory, and saved_caches_directory parameters set the paths where data and logs will reside.

Cassandra Operation and Data Distribution

Initiate the Cassandra databases on the database nodes.

[email protected]:~/# ./cassandra -f
[email protected]:~/# ./cassandra -f
[email protected]:~/# ./cassandra -f

The database cluster has been initiated.

From any database node, execute the nodetool utility to verify the database cluster. The same members will be reported regardless of which database node the utility is run on.

[email protected]:~/# ./nodetool status
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving --  Address        Load       Tokens  Owns (effective)  Host ID                                 Rack UN  192.168.2.24   72.62 KB   5       69.1%              6fdc0ead-a6c7-4e70-9a48-c9d0ef99fd84   RAC1 UN  192.168.2.22   184.55 KB  5       42.9%              26cc69f8-767e-4b1a-8da4-18d556a718a9   RAC1 UN  192.168.2.23    56.11 KB  5       88.0%              af955565-4535-4dfb-b5f5-e15190a1ee28   RAC1  [email protected]:/datapool1/zfsnode1/cassandra/bin# ./nodetool describecluster
Cluster Information:    Name: MyCluster    Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner    Schema versions:       6403a0ff-f93b-3b1f-8c35-0a8dc85a5b66: [192.168.2.24, 192.168.2.22, 192.168.2.23]

Start the cqlsh utility to create a keyspace and start adding and querying data. A keyspace is analogous to a schema in the relational database world. The replication factor (RF) is set to 2, so data will reside in two servers. There is no master/slave or primary/secondary notion. Both replicas are masters.

[email protected]:~/# ./cqlsh
Connected to MyCluster at localhost:9160. [cqlsh 4.1.1 | Cassandra 2.0.14-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh>  cqlsh> create keyspace myfirstkeyspace with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 2};
cqlsh> use myfirstkeyspace;
cqlsh:myfirstkeyspace> create table greek_locations ( loc_id int PRIMARY KEY, loc_name text, description text);
cqlsh:myfirstkeyspace> describe tables;
greek_locations  cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (1,'Thessaloniki','North Greece');
cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (2,'Larissa','Central Greece');
cqlsh:myfirstkeyspace> insert into greek_locations (loc_id, loc_name, description) values (3,'Athens','Central Greece - Capital');
cqlsh:myfirstkeyspace> select * from greek_locations;
loc_id | description              | loc_name --------+--------------------------+--------------       1 |             North Greece | Thessaloniki       2 |           Central Greece |      Larissa       3 | Central Greece - Capital |       Athens  (3 rows)

Connecting from any other database server should report the same results.

[email protected]:/datapool2/zfsnode2/cassandra/bin# ./cqlsh
Connected to MyCluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.1.9-SNAPSHOT | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh> use myfirstkeyspace;
cqlsh:myfirstkeyspace> select * from greek_locations;
loc_id | description              | loc_name --------+--------------------------+--------------       1 |             North Greece | Thessaloniki       2 |           Central Greece |      Larissa       3 | Central Greece - Capital |       Athens  (3 rows)

The ring parameter of the nodetool utility will report the token range limits for each of the servers. The num_tokens parameter was set to 5 in the Cassandra.yaml file, so there are 15 token ranges in total for the three servers.

[email protected]:/datapool1/zfsnode1/cassandra/bin# ./nodetool ring
Datacenter: DC1 ========== Address        Rack    Status  State   Load       Owns          Token                                                                      5554128420332708557 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?        -9135243804612957495 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -8061157299090260986 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?         -7087501046371881693 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?         -6454951218299078731 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?         -5793299020697319351 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?         -5588273793487800091 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -3763306950618271982 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -3568767174854581436 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?         -1113375360465059283 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          -682327379305650352  192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          112278302282739678 192.168.2.23   RAC1    Up     Normal  76.37 KB        ?          4952728554160670447 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          5093621811617287602 192.168.2.22   RAC1    Up     Normal  122.3 KB        ?          5342254592921898323 192.168.2.24   RAC1    Up     Normal  78.8 KB         ?          5554128420332708557    Warning: "nodetool ring" is used to output all the tokens of a node.   To view status related info of a node use "nodetool status" instead.

The describering parameter of the nodetool utility reports the token ranges and the endpoints in detail.

[email protected]:/datapool1/zfsnode1/cassandra/bin# ./nodetool describering myfirstkeyspace
Schema Version:155131ce-b922-37aa-a635-68e6fa96597c TokenRange:     TokenRange(start_token:5342254592921898323, end_token:5554128420332708557,  endpoints:[192.168.2.24, 192.168.2.22], rpc_endpoints:[127.0.0.1, 127.0.0.1],  endpoint_details:[EndpointDetails(host:192.168.2.24, datacenter:DC1, rack:RAC1),  EndpointDetails(host:192.168.2.22, datacenter:DC1, rack:RAC1)])    TokenRange(start_token:112278302282739678, end_token:4952728554160670447,  endpoints:[192.168.2.23, 192.168.2.24], rpc_endpoints:[127.0.0.1, 127.0.0.1],  endpoint_details:[EndpointDetails(host:192.168.2.23, datacenter:DC1, rack:RAC1),  EndpointDetails(host:192.168.2.24, datacenter:DC1, rack:RAC1)])    TokenRange(start_token:5554128420332708557, end_token:-9135243804612957495,  endpoints:[192.168.2.22, 192.168.2.23], rpc_

Comments