PySpark Installation Step on Window – Standalone

PySpark Installation Step on Window – Standalone

There are basically two steps for PySpark up and running on your system

  1. Installing and keeping relevant files at proper folder.
  2. Setting Environment Path

For Installation, there are four utility which needs to be present in your system to run any spark program on your system. Below is the download link to each of the package

  1. JDK installation:
    1. https://www.oracle.com/in/java/technologies/downloads/#jdk20-windows
  2. Python Installation
    1. https://www.python.org/downloads/
  3. Spark Installation
    1. https://spark.apache.org/downloads.html
  4. Winutils copy
    1. https://github.com/steveloughran/winutils

Step-A: Steps for Installation

JDK Installation:

Go to the JDK download url, and download the latest version of JDK tool kit. Then place the JDK tool kit to somewhere C: [for example: C:\java\jdk20] and extract the jdk file using 7zip or any zip/tar extractor software.

Python Installation:

Please follow earlier blog: do-you-know-we-can-keep-multiple-version-of-python-in-our-single-system/

Spark Installation:

Go to the Spark download url, and download the latest version of spark tool kit. Then place the spark tool kit to somewhere C: [for example: C:\spark\spark-3.4.1-bin-hadoop3] and extract the tar file using 7zip or any zip/tar extractor software.

Winutil Installation:

Go to the gitrepo url, and download the latest version of winutil.exe for Hadoop setup. Then place the spark tool kit to somewhere C: [for example: C:\hadoop\bin]

Step-B: Steps for setting up Environment Variable

Now need to setup Environement variable

  1. HADOOP_HOME
  2. SPARK_HOME
  3. JAVA_HOME
  4. PYSPARK_HOME

Then add it to path variable, double click on path variable or select path variable and tap EDIT to open the edit window then add the highlighted one

That’s all you are all set to start writing Spark Jobs.

Go to the command Prompt and type spark-submit

Leave a Reply

Your email address will not be published. Required fields are marked *