Posts Settting up pyspark development on vscode (Mac)
Post
Cancel

Settting up pyspark development on vscode (Mac)

Sometime its very tricky to set up pyspark on vscode, today I will be doing the same and document this for all of us. Below are the steps we need to follow.

Steps:

  • Installation of apache-spark
  • Installation of vscode
  • Setting of pyspark on vscode

Requirement:

  • Java 1.8+
  • scala / python 3

Installation of apache-spark

Below are the steps for auto installation using brew, you can use this or you can download the spark from here. Once you downlaod you and untar at any place and setup spark home.

1
2
3
4
5
6
7
# using brew
brew install scala
brew install apache-spark

# download from spark
wget https://www.apache.org/dyn/closer.lua/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz
tar -C /Users/vssrivastava/Install/ xvfz spark-3.0.0-bin-hadoop2.7.tgz

Now you need to add spark home directory into your .bash file

1
2
3
4
5
6
7
# If you have manually installed it 
export SPARK_HOME=/Users/vssrivastava/Install/spark-3.0.0-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

# If installed using brew
export SPARK_HOME=/usr/local/Cellar/apache-spark/3.0.0/libexec
export PATH=$SPARK_HOME/bin:$PATH

Installation of vscode

Visual Code Studio is now most used IDE among developer community. I personally used it for almost everything. Its developed by Microsoft and its free for community use. You can download it from here

vscode

Setting of pyspark on vscode

Generally you need to use findspark to find the spark but you need to keep it before the full code.

1
2
3
4
import findspark
findspark.init()

from pyspark.sql import SparkSession

But there is a better way to do this, in that case you don’t need to add findspark or install it. You need to set up environment variable in the vscode. Let’s add the variable in vscode

code -> preference -> setting -> {search for 'ENV: Osx'} -> edit the setting.json

add below lines

1
2
3
"terminal.integrated.env.osx": {
    "SPARK_HOME": "/usr/local/Cellar/apache-spark/3.0.0/libexec"
  }

Once you add above lines restart the vscode and test it, Before writing code all you need to do is to download pyspark package

1
2
3
4
5
conda install -c conda-forge pyspark

or

pip install pyspark

code

I hope, You all have successfully setup the vscode with spark development. Let me know if you face any issues or have any doubts.

Happy Learning !!

This post is licensed under CC BY 4.0 by the author.