本篇主要在部署Spark on Mesos的環境
目標是Spark跟Mesos的master配上兩台Mesos standby(同時為Zookeeper)
cassSpark1為Mesos master跟Spark master,cassSpark2以及cassSpark3為mesos standby
這三台同時也是Mesos slaves跟Spark slaves (實際用途中,會是其他電腦,這裡用VM就都放一起了)
準備工作
這裡基本上跟前篇 一樣,就不贅述了
開始部署
i. 下載檔案並移到適當位置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 sudo mkdir /usr/local /bigdata sudo chown -R tester /usr/local /bigdata curl -v -j -k -L -H "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u101-b13/jdk-8u101-linux-x64.rpm -o jdk-8u101-linux-x64.rpm sudo yum install -y jdk-8u101-linux-x64.rpm curl -v -j -k -L http://apache.stu.edu.tw/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz -o zookeeper-3.4.8.tar.gz tar -zxvf zookeeper-3.4.8.tar.gz sudo mv zookeeper-3.4.8 /usr/local /bigdata/zookeeper sudo chown -R tester /usr/local /bigdata/zookeeper curl -v -j -k -L http://repos.mesosphere.com/el/7/x86_64/RPMS/mesos-1.0.0-2.0.89.centos701406.x86_64.rpm -o mesos-1.0.0-2.0.89.centos701406.x86_64.rpm sudo yum install mesos-1.0.0-2.0.89.centos701406.x86_64.rpm curl -v -j -k -L http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz -o scala-2.11.8.tgz tar -zxvf scala-2.11.8.tgz mv scala-2.11.8 /usr/local /bigdata/scala curl -v -j -k -L http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.6.tgz -o spark-2.0.0-bin-hadoop2.6.tgz tar -zxvf spark-2.0.0-bin-hadoop2.6.tgz mv spark-2.0.0-bin-hadoop2.6 /usr/local /bigdata/spark curl -v -j -k -L http://apache.stu.edu.tw/cassandra/2.2.7/apache-cassandra-2.2.7-bin.tar.gz -o apache-cassandra-2.2.7-bin.tar.gz tar -zxvf apache-cassandra-2.2.7-bin.tar.gz mv apache-cassandra-2.2.7 /usr/local /bigdata/cassandra
ii. 環境變數設置
1 2 3 4 5 6 7 8 9 10 11 12 13 sudo tee -a /etc/bashrc << "EOF" export JAVA_HOME=/usr/java/jdk1.8.0_101export ZOOKEEPER_HOME=/usr/local /bigdata/zookeeperexport SCALA_HOME=/usr/local /bigdata/scalaexport SPARK_HOME=/usr/local /bigdata/sparkexport CASSANDRA_HOME=/usr/local /bigdata/cassandraexport PATH=$PATH :$JAVA_HOME :$ZOOKEEPER_HOME /bin:$SPARK_HOME /bin:$CASSANDRA_HOME /bin
iv. 配置Zookeeper
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cp $ZOOKEEPER_HOME /conf/zoo_sample.cfg $ZOOKEEPER_HOME /conf/zoo.cfg tee $ZOOKEEPER_HOME /conf/zoo.cfg << "EOF" dataDir=/usr/local /bigdata/zookeeper/data server.1=cassSpark1:2888:3888 server.2=cassSpark2:2888:3888 server.3=cassSpark3:2888:3888 EOF mkdir $ZOOKEEPER_HOME /data tee $ZOOKEEPER_HOME /data/myid << "EOF" 1 EOF
在cassSpark2跟cassSpark3分別設定為2跟3。
啟動zookeeper:
1 2 3 4 zkServer.sh start ssh tester@cassSpark2 "zkServer.sh start" ssh tester@cassSpark3 "zkServer.sh start"
再來是測試看看是否有部署成功,先輸入zkCli.sh -server cassSpark1:2181,cassSpark2:2181,cassSpark3:2181
可以登錄到zookeeper的server上,如果是正常運作會看到下面的訊息:
1 [zk: cassSpark1:2181,cassSpark2:2181,cassSpark3:2181(CONNECTED) 0]
此時試著輸入看看create /test01 abcd
,然後輸入ls /
看看是否會出現[test01, zookeeper]
如果是,zookeeper就是設定成功,如果中間有出現任何錯誤,則否
最後用delete /test01
做刪除即可,然後用quit
離開。
最後是設定開機自動啟動zookeeper server:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 sudo tee /etc/init.d/zookeeper << "EOF" #!/bin/bash ZOOKEEPER=/usr/local /bigdata/zookeeper source /etc/rc.d/init.d/functions source $ZOOKEEPER /bin/zkEnv.shRETVAL=0 PIDFILE=/var/run/zookeeper_server.pid desc="ZooKeeper daemon" start () { echo -n $"Starting $desc (zookeeper): " daemon $ZOOKEEPER /bin/zkServer.sh start RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/zookeeper return $RETVAL } stop () { echo -n $"Stopping $desc (zookeeper): " daemon $ZOOKEEPER /bin/zkServer.sh stop RETVAL=$? sleep 5 echo [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/zookeeper $PIDFILE } restart () { stop start } get_pid () { cat "$PIDFILE " } checkstatus (){ status -p $PIDFILE ${JAVA_HOME} /bin/java RETVAL=$? } condrestart (){ [ -e /var/lock/subsys/zookeeper ] && restart || : } case "$1 " in start) start ;; stop) stop ;; status) checkstatus ;; restart) restart ;; condrestart) condrestart ;; *) echo $"Usage: $0 {start|stop|status|restart|condrestart}" exit 1 esac exit $RETVAL EOF sudo chmod +x /etc/init.d/zookeeper sudo chkconfig --add zookeeper sudo service zookeeper start
v. 配置mesos
下面的動作在cassSpark1, cassSpark2, cassSpark3這三台都要配置
1 2 3 4 5 6 7 8 9 sudo tee /etc/mesos/zk << "EOF" zk://192.168.0.121:2181,192.168.0.122:2181,192.168.0.123:2181/mesos EOF sudo tee /etc/mesos-master/quorum << "EOF" 2 EOF
ssh-copy-id -i ~/.ssh/id_rsa.pub cassSpark3
再來就是啟動了
1 2 3 4 5 6 7 8 9 10 sudo systemctl disable mesos-master sudo systemctl stop mesos-master sudo service mesos-master restart sudo systemctl disable mesos-slave sudo systemctl stop mesos-slave sudo service mesos-slave restart
iii. 配置scala and spark
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 cp $SPARK_HOME /conf/spark-env.sh.template $SPARK_HOME /conf/spark-env.sh cp $SPARK_HOME /conf/log4j.properties.template $SPARK_HOME /conf/log4j.properties cp $SPARK_HOME /conf/spark-defaults.conf.template $SPARK_HOME /conf/spark-defaults.conf tee -a $SPARK_HOME /conf/spark-env.sh << "EOF" SPARK_LOCAL_DIRS=/usr/local /bigdata/spark SPARK_SCALA_VERSION=2.11 EOF sudo yum install sbt git-core git clone git@github.com:datastax/spark-cassandra-connector.git cd spark-cassandra-connectorrm -r spark-cassandra-connector-perf sbt -Dscala-2.11=true assembly mkdir $SPARK_HOME /extraClass cp spark-cassandra-connector/target/scala-2.11/spark-cassandra-connector-assembly-2.0.0-M1-2-g70018a6.jar $SPARK_HOME /extraClass tee -a $SPARK_HOME /conf/spark-defaults.conf << "EOF" spark.driver.extraClassPath /usr/local /bigdata/spark/extraClass/spark-cassandra-connector-assembly-2.0.0-M1.jar spark.driver.extraLibraryPath /usr/local /bigdata/spark/extraClass spark.executor.extraClassPath /usr/local /bigdata/spark/extraClass/spark-cassandra-connector-assembly-2.0.0-M1.jar spark.executor.extraLibraryPath /usr/local /bigdata/spark/extraClass spark.jars /usr/local /bigdata/spark/extraClass/spark-cassandra-connector-assembly-2.0.0-M1.jar spark.cores.max 3 spark.driver.memory 4g spark.executor.memory 4g EOF
Spark的master, slave就不用啟動了,直接用Mesos即可
如果之前用我的方式配置過自動啟動的話,請用下面的指令移除:
1 2 3 4 5 6 7 8 sudo systemctl stop spark-master.service sudo rm /etc/systemd/system/multi-user.target.wants/spark-master.service sudo systemctl stop spark-slave.service sudo rm /etc/systemd/system/multi-user.target.wants/spark-slave.service sudo systemctl daemon-reload
至於cassandra的設置就都一樣,此處就不贅述,直接進測試
執行下面兩行,成功執行就是Mesos有裝成功
1 2 MASTER=$(mesos-resolve `cat /etc/mesos/zk`) mesos-execute --master=$MASTER --name="cluster-test" --command ="sleep 5"
Mesos也可以透過去連接5050 port到目前的master上去,有出現網頁就是正常
再來測試Spark,用spark-shell --master mesos://zk://192.168.0.121:2181,192.168.0.122:2181,192.168.0.123:2181/mesos
開啟spark-shell確定功能正常
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 val NUM_SAMPLES = 1000000 val count = sc.parallelize(1 to NUM_SAMPLES ).map{i => val x = Math .random() val y = Math .random() if (x*x + y*y < 1 ) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / NUM_SAMPLES ) sc.stop() import com.datastax.spark.connector._, org.apache.spark.SparkContext , org.apache.spark.SparkContext ._, org.apache.spark.SparkConf , com.datastax.spark.connector.cql.CassandraConnector val conf = new SparkConf (true ).set("spark.cassandra.connection.host" , "192.168.0.121" )val sc = new SparkContext (conf)CassandraConnector (conf).withSessionDo { session => session.execute("CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 2 }" ) session.execute("DROP TABLE IF EXISTS test.kv" ) session.execute("CREATE TABLE test.kv (key text PRIMARY KEY, value DOUBLE)" ) session.execute("INSERT INTO test.kv(key, value) VALUES ('key1', 1.0)" ) session.execute("INSERT INTO test.kv(key, value) VALUES ('key2', 2.5)" ) } val rdd = sc.cassandraTable("test" , "kv" )println(rdd.first) val collection = sc.parallelize(Seq (("key3" , 1.7 ), ("key4" , 3.5 )))collection.saveToCassandra("test" , "kv" , SomeColumns ("key" , "value" )) rdd.collect().foreach(row => println(s"Existing Data: $row " )) sc.stop() :quit
備註:
如果cluster不能對外連線的話,curl可以取得的檔案都先經由能夠連線的電腦取得
至於Mesos的dependencies,先在能夠對外連線的centos電腦上下
sudo yum install --downloadonly --downloaddir=pkgs mesos-1.0.0-2.0.89.centos701406.x86_64.rpm
這樣就會把要下載rpm檔案全部都載下來到pkgs
的資料夾,這些在打包傳到cluster上
然後用sudo yum install *.rpm
安裝,在安裝Mesos即可