基于Kubernetes的Spark部署完全指南
此刻让我们提交一个Job,看看是否执行正常。不外在此之前,你必要一个有用的AWS S3账户,以及存有样本数据的桶存在。我行使了Kaggle下载样本数据,样本数据可以从https://www.kaggle.com/datasna ... s.csv获取,获取往后必要上传到S3的桶里。假定桶名是s3-data-bucket,那么样本数据文件则位于s3-data-bucket/data.csv。 数据筹备好往后,将其加载到一个Spark master pod中执行。以Pod名为spark-master-controller-5rgz2为例,呼吁如下: kubectl exec -it spark-master-controller-v2hjb /bin/bash 假如你登录进入了Spark体系,可以运行Spark Shell: export SPARK_DIST_CLASSPATH=$(hadoop classpath) spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at :4040 Spark context available as 'sc' (master = spark://spark-master:7077, app id = app-20170405152342-0000). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /___/ .__/_,_/_/ /_/_ version 2.4.4 /_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_221) Type in expressions to have them evaluated. Type :help for more information.
scala> 此刻让我们汇报Spark Master,S3存储的具体信息,在上文所示的Scale提醒符中输入以下设置: sc.hadoopConfiguration.set("fs.s3a.endpoint", "https://s3.amazonaws.com") sc.hadoopConfiguration.set("fs.s3a.access.key", "s3-access-key") sc.hadoopConfiguration.set("fs.s3a.secret.key", "s3-secret-key") 此刻,只需将以下内容粘贴到Scala提醒符中,以提交Spark Job(请记得修改S3相干字段): import org.apache.spark._ import org.apache.spark.rdd.RDD import org.apache.spark.util.IntParam import org.apache.spark.sql.SQLContext import org.apache.spark.graphx._ import org.apache.spark.graphx.util.GraphGenerators import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.tree.DecisionTree import org.apache.spark.mllib.tree.model.DecisionTreeModel import org.apache.spark.mllib.util.MLUtils
val conf = new SparkConf().setAppName("YouTube") val sqlContext = new SQLContext(sc)
import sqlContext.implicits._ import sqlContext._
(编辑:湖南网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |