8.9 建置 Spark standalone cluster執行環境


8.1 安裝scala
Step1~4 下載安裝 Scala
wget http://www.scala-lang.org/files/archive/scala-2.11.6.tgz
tar xvf scala-2.11.6.tgz
sudo mv scala-2.11.6 /usr/local/scala
Step5 Scala使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc
輸入下列內容
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
Step6 使讓~/.bashrc修改生效
source ~/.bashrc
8.2 安裝Spark
Step1~3 下載安裝 Spark
wget http://apache.stu.edu.tw/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
tar zxf spark-1.4.0-bin-hadoop2.6.tgz
sudo mv spark-1.4.0-bin-hadoop2.6 /usr/local/spark/
Step4 Spark使用者環境變數設定
修改~/.bashrc
sudo gedit ~/.bashrc
輸入下列內容
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
Step5 使讓~/.bashrc修改生效
source ~/.bashrc

8.4 啟動spark-shell互動介面

spark-shell
8.5 設定spark-shell 顯示訊息

cd /usr/local/spark/conf
cp log4j.properties.template log4j.properties 
修改log4j.properties
sudo gedit log4j.properties
開啟gedit編輯log4j.properties,原本是INFO改為WARN
8.6 啟動Hadoop
start-all.sh
8.7 本機執行Spark-shell 程式
Step1 進入spark-shell
spark-shell  --master local[4]
Step2 讀取本機檔案
val textFile=sc.textFile("/usr/local/spark/README.md")
textFile.count
Step3 讀取HDFS檔案
val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
textFile.count
8.8 在Hadoop YARN執行spark-shell
Step1 在Hadoop YARN執行spark-shell
SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell
Step2 讀取本機檔案
錯誤語法
val textFile=sc.textFile("/usr/local/spark/README.md")
textFile.count
正確語法
val textFile=sc.textFile("file:/usr/local/spark/README.md")
textFile.count
Step3 讀取HDFS檔案
val textFile = sc.textFile("/user/hduser/wordcount/input/pg5000.txt")
val textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/pg5000.txt")
textFile.count

此圖出自Spark官網 https://spark.apache.org/


以上內容節錄自這本書,很適合Python程式設計師學習Spark機器學習與大數據架構,點選下列連結查看本書詳細介紹:
  Python+Spark 2.0+Hadoop機器學習與大數據分析實戰
  http://pythonsparkhadoop.blogspot.tw/2016/10/pythonspark-20hadoop.html

《購買本書 限時特價專區》
博客來網路書店: http://www.books.com.tw/products/0010730134?loc=P_007_090

天瓏網路書店: https://www.tenlong.com.tw/items/9864341537?item_id=1023658
  

露天拍賣:http://goods.ruten.com.tw/item/show?21640846068139
蝦皮拍賣:https://goo.gl/IEx13P 



Share on Google Plus

About kevin

This is a short description in the author block about the author. You edit it by entering text in the "Biographical Info" field in the user admin panel.
    Blogger Comment
    Facebook Comment

5 意見:


  1. I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........

    回覆刪除
  2. Linux Online training in India – Webtrackker Technology is providing the linux online training with 100% placement support. If you are looking for the BEST LINUX & UNIX Training Institute In india or linux online training from india, live project based LINUX & UNIX online training then you can contact to us.

    Python online training in India, RPA Online training in India, Salesforce online training in india, AWS online training in india, Cloud Computing Online Training in India, SAS Online Training in india, Hadoop online training in INDIA, Oracle DBA online training in India, SAP online Training In india, Linux Online training in India








    回覆刪除