# you can install this Scala anywhere – but I install it with sudo / root
wget https://www.scala-lang.org/files/archive/scala-2.13.0-M5.tgz sudo tar xzvf scala-2.13.0-M5.tgz sudo mv scala-2.13.0-M5.tgz scala213 alternatives --install /usr/bin/scala scala /opt/hadoop/scala213 1
# just verify where it is and that there are no other versions – at least that alternatives can see
alternatives --config scala There is 1 program that provides 'scala'. Selection Command ----------------------------------------------- *+ 1 /opt/hadoop/scala213 Enter to keep the current selection[+], or type selection number: failed to create /var/lib/alternatives/scala.new: Permission denied [hadoop@cent7 ~]$ scala -version Scala code runner version 2.13.0-M5 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc. [hadoop@cent7 ~]$
# just an FYI – I installed it into a directory structure with hadoop in the name – but there is no hadoop installed here – Spark is installed standalone
[hadoop@cent7 ~]$ spark-shell 2018-09-05 01:35:28 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://cent7:4040 Spark context available as 'sc' (master = local[*], app id = local-1536136533449). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181) Type in expressions to have them evaluated. Type :help for more information.
# yes you can use any file you want that exists with lines and words in it ;-)… worse yet – spark is still using the version of scala that comes with it – see – “Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)” above – so you know the install I did of Spark 2.3.1 has it’s own Scala compiler – or at least run-time…
scala> val file = sc.textFile("/opt/hadoop/apache-spark-examples/README.md"); file: org.apache.spark.rdd.RDD[String] = /opt/hadoop/apache-spark-examples/README.md MapPartitionsRDD[1] at textFile at <console>:24 scala> file.count(); res0: Long = 46 scala> file.first(); res1: String = # apache-spark-examples
So what happens if I kick off the newer scala – and try to run spark code from it – lets see
Does not work so hot…
scala> [hadoop@cent7 ~]$ scala Welcome to Scala 2.13.0-M5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181). Type in expressions for evaluation. Or try :help. scala> val file = sc.textFile("/opt/hadoop/apache-spark-examples/README.md"); ^ error: not found: value sc
Maybe we should look at “spark-shell” and see what that is doing that a straight invoke of scala isn’t doing…
yep… spark-shell is adding a bunch of stuff and running spark-submit so we would need to follow that around, here is a “key” line from spark-shell:
"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
and spark-submit calls spark-class…in the end… it is JAVA being called with a some spark classes being used… which is what I should have already known…
Anyway, here is a nice Spark use case from Barclays
http://blog.cloudera.com/blog/2015/08/how-apache-spark-scala-and-functional-programming-made-hard-problems-easy-at-barclays/