This guide describes how to install Spark to run Spark through .net core.
If you only want to know how to do this, see later.
To install Spark, Java must be installed by default. This is a required setting for all Hadoop installations, and the default account worked through a user named hadoop.
To install Spark, Java must be installed by default. This is a required setting for all Hadoop installations, and the default account worked through a user named hadoop.
Spark 2.4.2 is not currently supported by .net core.
The .netcore 3.0 method will be explained later.
Install OpenJDK 8
sudo apt install openjdk-8-jdk
You can check whether it is installed well through java command. And if you want to specify OpenJDK, type the following command.
sudo update-alternatives --config java
Install Apache Maven
To install Apache Maven, you need to register it as an environment variable to download and run it. Enter the following command
mkdir -p ~/bin/maven
cd ~/bin/maven
wget http://apache.tt.co.kr/maven/maven-3/3.6.2/binaries/apache-maven-3.6.2-bin.tar.gz
tar -xvzf apache-maven-3.6.2-bin.tar.gz
ln -s apache-maven-3.6.2 current
export M2_HOME=~/bin/maven/current
export PATH=$M2_HOME/bin:$PATH
source ~/.bashrc
If the mvn command runs well, that’s good install.
Install Apache Spark
Now install Spark.
cd ~/bin/
wget http://apache.tt.co.kr/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
export SPARK_HOME=~/bin/spark-2.4.4-bin-hadoop2.7
export PATH="$SPARK_HOME/bin:$PATH"
source ~/.bashrc
If the spark-shell command works, it works fine.
Spark .NET Build
Now let’s copy Spark .NET and proceed with the build.
git clone https://github.com/dotnet/spark.git ~/dotnet.spark
cd ~/dotnet.spark/src/scala
mvn clean package
You should have a JARs file in the src / scala subdirectory that supports Spark execution.
microsoft-spark-2.3.x/target/microsoft-spark-2.x.x-<version>.jar
.NET core Install
Install .NET core for run .NET core programs.
.NET core can be installed by selecting the following version.
wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo add-apt-repository universe
sudo apt-get update
sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-2.1
.NET Program build
Let’s build and run the .NET core Spark Worker and Examples.
cd ~/dotnet.spark/src/csharp/Microsoft.Spark.Worker/
dotnet publish -f netcoreapp2.2 -r ubuntu.18.04-x64
cd ~/dotnet.spark/examples/Microsoft.Spark.CSharp.Examples/ dotnet publish -f netcoreapp2.2 -r ubuntu.18.04-x64
Once you have a good build, you can run the program.
spark-submit \
[--jars <any-jars-your-app-is-dependent-on>] \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--master <ip/local> \
<path-microsoft-spark-work-jar> \
<path-.netcore-app-binary>
The actual command is shown below. The long path can be a little hard to see.
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local ~/dotnet.spark/src/scala/microsoft-spark-2.4.x/target/microsoft-spark-2.4.x-0.6.0.jar ~/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/netcoreapp2.1/ubuntu.18.04-x64/publish/Microsoft.Spark.CSharp.Examples Sql.Batch.Basic /home/hadoop/bin/spark-2.4.4-bin-hadoop2.7/examples/src/main/resources/people.json
'Linux' 카테고리의 다른 글
swappiness - Linux Swap 조절 (0) | 2020.10.22 |
---|---|
Clear to Memory Cache, Buffer on Linux (0) | 2020.10.22 |
How to install run .netcore 2.x with spark on Ubuntu (0) | 2020.10.22 |
Logrotate – how to hourly log record on Linux (0) | 2020.10.22 |
Curator -Elasticsearch Delete index Quick Guide (0) | 2020.10.22 |
VI/VIM Editor – Basic Guide (0) | 2020.10.22 |
댓글0