Follow these steps to install Spark 2.3.1, compiled with Hadoop 2.7:
If you have a Spark 2.0 tar distribution (for example,spark-2.3.1-bin-hadoop2.7.tgz), then copy it into your Linux VM at any location (for example,/opt)using any Windows on Linux file transfer software (FileZilla or WinSCP). Alternatively, you can download the latest binary.tar.gz file from the following Apache Spark link: http://spark.apache.org/downloads.html.
The /opt file is an empty folder within root in most Linux-based operating folders. Here, we would use this folder to copy and install software. By default, this folder is owned by Root. So, run the following command if you are getting permission issues while accessing this folder. sudochmod-R 777 /opt.
Go to the location where you have copied the Spark software package anduncompressit:
cd /opt tar -xzvf spark-2.3.1-bin-hadoop2.7.tgz
Set the environment variable in.bash_profile, as follows: