這篇是我在centos部署cassandra的紀錄
準備工作
基本上同Hadoop那篇,這裡不贅述
部署Cassandra
1 2 3 4 5 6 7 8 9 curl -v -j -k -L http://apache.stu.edu.tw/cassandra/2.2.7/apache-cassandra-2.2.7-bin.tar.gz -o apache-cassandra-2.2.7-bin.tar.gz tar -zxvf apache-cassandra-2.2.7-bin.tar.gz sudo mv apache-cassandra-2.2.7 /usr/local /cassandra sudo chown -R tester /usr/local /cassandra sudo tee -a /etc/bashrc << "EOF" export CASSANDRA_HOME=/usr/local /cassandraexport PATH=$PATH :$CASSANDRA_HOME /binEOF
修改配置
使用vi $CASSANDRA_HOME/conf/cassandra.yaml
去改設定檔,改的部分如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 cluster_name: 'sparkSever' seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "192.168.0.161,192.168.0.162,192.168.0.163,192.168.0.164" listen_address: 192.168 .0 .161 rpc_address: 192.168 .0 .161 endpoint_snitch: GossipingPropertyFileSnitch
一台裝完之後,可以用下面指令做複製的動作,然後修改需要設定的地方(listen_address跟rpc_address):
1 2 3 4 5 6 7 scp -rp /usr/local /cassandra tester@sparkServer1:/usr/local scp -rp /usr/local /cassandra tester@sparkServer2:/usr/local scp -rp /usr/local /cassandra tester@sparkServer3:/usr/local ssh tester@sparkServer1 "sed -i -e 's/: 192.168.0.161/: 192.168.0.162/g' /usr/local/cassandra/conf/cassandra.yaml" ssh tester@sparkServer2 "sed -i -e 's/: 192.168.0.161/: 192.168.0.163/g' /usr/local/cassandra/conf/cassandra.yaml" ssh tester@sparkServer3 "sed -i -e 's/: 192.168.0.161/: 192.168.0.164/g' /usr/local/cassandra/conf/cassandra.yaml"
啟動Cassandra
在sparkServer0上輸入下面的指令,就可以成功開啟四台Cassandra的node:
1 2 3 4 ssh tester@sparkServer1 "cassandra" ssh tester@sparkServer2 "cassandra" ssh tester@sparkServer3 "cassandra" cassandra
用nodetool status
可以確定一下是不是都有跑起來,顯示資訊如下:
1 2 3 4 5 6 7 8 9 10 nodetool status
自動啟動Cassandra
開機自動啟動Cassandra的script(用sudo vi /etc/init.d/cassandra
去create):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 #!/bin/bash . /etc/rc.d/init.d/functions CASSANDRA_HOME=/usr/local /cassandra CASSANDRA_BIN=$CASSANDRA_HOME /bin/cassandra CASSANDRA_NODETOOL=$CASSANDRA_HOME /bin/nodetool CASSANDRA_LOG=$CASSANDRA_HOME /logs/cassandra.log CASSANDRA_PID=/var/run/cassandra.pid CASSANDRA_LOCK=/var/lock/subsys/cassandra PROGRAM="cassandra" if [ ! -f $CASSANDRA_BIN ]; then echo "File not found: $CASSANDRA_BIN " exit 1 fi RETVAL=0 start () { if [ -f $CASSANDRA_PID ] && checkpid `cat $CASSANDRA_PID `; then echo "Cassandra is already running." exit 0 fi echo -n $"Starting $PROGRAM : " daemon $CASSANDRA_BIN -p $CASSANDRA_PID >> $CASSANDRA_LOG 2>&1 usleep 500000 RETVAL=$? if [ $RETVAL -eq 0 ]; then touch $CASSANDRA_LOCK echo_success else echo_failure fi echo return $RETVAL } stop () { if [ ! -f $CASSANDRA_PID ]; then echo "Cassandra is already stopped." exit 0 fi echo -n $"Stopping $PROGRAM : " $CASSANDRA_NODETOOL -h 127.0.0.1 decommission if kill `cat $CASSANDRA_PID `; then RETVAL=0 rm -f $CASSANDRA_LOCK echo_success else RETVAL=1 echo_failure fi echo [ $RETVAL = 0 ] } status_fn () { if [ -f $CASSANDRA_PID ] && checkpid `cat $CASSANDRA_PID `; then echo "Cassandra is running." exit 0 else echo "Cassandra is stopped." exit 1 fi } case "$1 " in start) start ;; stop) stop ;; status) status_fn ;; restart) stop start ;; *) echo $"Usage: $PROGRAM {start|stop|restart|status}" RETVAL=3 esac exit $RETVAL
然後使用下面指令讓這個script能夠自動跑:
1 2 3 sudo chmod +x /etc/init.d/cassandra sudo chkconfig --add cassandra sudo service cassandra start
測試
打開Terminal,輸入cqlsh 192.168.0.161
(任意一台有cassandra在運行的電腦IP)就可以開始用Cassandra的cql了,簡單測試如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 CREATE KEYSPACE mariadbtest2 WITH replication = {'class' : 'SimpleStrategy' , 'replication_factor' : '3' }; USE mariadbtest2;CREATE TABLE t1 (rowid text , data1 text , data2 int , PRIMARY KEY (rowid ));INSERT INTO t1 (rowid , data1, data2) VALUES ('rowid001' , 'g1' , 123456 );INSERT INTO t1 (rowid , data1, data2) VALUES ('rowid002' , 'g2' , 34543 );INSERT INTO t1 (rowid , data1, data2) VALUES ('rowid003' , 'g1' , 97548 );INSERT INTO t1 (rowid , data1, data2) VALUES ('rowid004' , 'g1' , 62145 );INSERT INTO t1 (rowid , data1, data2) VALUES ('rowid005' , 'g2' , 140578 );SELECT * FROM t1;SELECT sum (data2) FROM t1;
現行的Cassandra CQL還沒支援使用GROUP BY
的功能
看了一下網路討論,主要是Cassandra最原本的開發目的是為了快速讀取、儲存的地方,而非計算之用
因此,CQL這一塊還在發展,我有看到issue已經要準備在3.X更新GROUP BY
的部分了,敬請期待
Reference
http://blog.fens.me/nosql-r-cassandra/
https://twgame.wordpress.com/2015/02/16/real-machine-cassandra-cluster/
http://www.planetcassandra.org/blog/installing-the-cassandra-spark-oss-stack/
http://datastax.github.io/python-driver/getting_started.html
https://docs.datastax.com/en/developer/python-driver/1.0/python-driver/quick_start/qsSimpleClientAddSession_t.html
https://mariadb.com/kb/en/mariadb/cassandra-storage-engine-use-example/