Hadoop 集群搭建 (八):Spark

Spark:传说中的星二代

aziz-acharki-370112-unsplash

Spark 安装配置

安装 Spark

  • cd /mnt/hgfs/Hadoop
  • cp spark-2.3.3-bin-hadoop2.7.tgz /usr/local/src/
  • cd /usr/local/src/
  • tar zxvf spark-2.3.3-bin-hadoop2.7.tgz
  • rm -rf spark-2.3.3-bin-hadoop2.7.tgz

配置 Spark 环境变量:

  • vim ~/.bashrc
# 添加如下信息
# SET SPARK PATH
export SPARK_HOME=/usr/local/src/spark-2.3.3-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
  • source ~/.bashrc

修改 spark 配置文件

  • cd spark-2.3.3-bin-hadoop2.7/conf
  • cp spark-env.sh.template spark-env.sh
  • vim spark-env.sh
# 添加如下信息
export SCALA_HOME=/usr/local/src/scala-2.11.12
export JAVA_HOME=/usr/local/src/jdk1.8.0_212
export HADOOP_HOME=/usr/local/src/hadoop-2.8.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181"
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/usr/local/src/spark-2.3.3-bin-hadoop2.7
SPARK_DRIVER_MEMORY=1G
  • cp slaves.template slaves
  • vim slaves
# 添加如下信息
slave1
slave2

启动 Standalone

仅在 Master

启动集群:

  • cd spark-2.3.3-bin-hadoop2.7/sbin
  • ./start-all.sh

WEB 监控页面:

  • Spark:http://master:8080

验证

  • cd spark-2.3.3-bin-hadoop2.7

本地模式:

  • ./bin/run-example SparkPi 10 --master local[2]

集群 Standlone:

  • ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 examples/jars/spark-examples_2.11-2.3.3.jar 10

Spark on yarn:

  • ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster examples/jars/spark-examples_2.11-2.3.3.jar 10