installing scala on RHEL 7 or clones and do we really need this with Spark? No!

# you can install this Scala anywhere – but I install it with sudo / root

wget https://www.scala-lang.org/files/archive/scala-2.13.0-M5.tgz
sudo tar xzvf scala-2.13.0-M5.tgz
sudo mv scala-2.13.0-M5.tgz scala213
alternatives --install /usr/bin/scala scala /opt/hadoop/scala213 1

# just verify where it is and that there are no other versions – at least that alternatives can see

alternatives --config scala 
There is 1 program that provides 'scala'.
Selection Command
-----------------------------------------------
*+ 1 /opt/hadoop/scala213
Enter to keep the current selection[+], or type selection number:
failed to create /var/lib/alternatives/scala.new: Permission denied
[hadoop@cent7 ~]$ scala -version
Scala code runner version 2.13.0-M5 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
[hadoop@cent7 ~]$

# just an FYI – I installed it into a directory structure with hadoop in the name – but there is no hadoop installed here – Spark is installed standalone

[hadoop@cent7 ~]$ spark-shell
2018-09-05 01:35:28 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://cent7:4040
Spark context available as 'sc' (master = local[*], app id = local-1536136533449).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
      /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

# yes you can use any file you want that exists with lines and words in it ;-)… worse yet – spark is still using the version of scala that comes with it – see – “Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)” above – so you know the install I did of Spark 2.3.1 has it’s own Scala compiler – or at least run-time…

scala> val file = sc.textFile("/opt/hadoop/apache-spark-examples/README.md");
file: org.apache.spark.rdd.RDD[String] = /opt/hadoop/apache-spark-examples/README.md MapPartitionsRDD[1] at textFile at <console>:24
scala> file.count();
res0: Long = 46
scala> file.first();
res1: String = # apache-spark-examples

So what happens if I kick off the newer scala – and try to run spark code from it – lets see

Does not work so hot…

scala> [hadoop@cent7 ~]$ scala
Welcome to Scala 2.13.0-M5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181).
Type in expressions for evaluation. Or try :help.
scala> val file = sc.textFile("/opt/hadoop/apache-spark-examples/README.md");
                  ^
       error: not found: value sc

Maybe we should look at “spark-shell” and see what that is doing that a straight invoke of scala isn’t doing…
yep… spark-shell is adding a bunch of stuff and running spark-submit so we would need to follow that around, here is a “key” line from spark-shell:

"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"

and spark-submit calls spark-class…in the end… it is JAVA being called with a some spark classes being used… which is what I should have already known…

Anyway, here is a nice Spark use case from Barclays

http://blog.cloudera.com/blog/2015/08/how-apache-spark-scala-and-functional-programming-made-hard-problems-easy-at-barclays/

For more Big Data stuff from lonzodb

Leave a Comment

Scroll to Top