Home:ALL Converter>Link Spark with iPython Notebook

Link Spark with iPython Notebook

Ask Time:2015-10-11T18:39:31         Author:r4id4

Json Formatter

I have followed some tutorial online but they do not work with Spark 1.5.1 on OS X El Capitan (10.11)

Basically I have run this commands download apache-spark

brew update
brew install scala
brew install apache-spark

updated the .bash_profile

# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
  export SPARK_HOME="/usr/local/Cellar/apache-spark/1.5.1/libexec/"
  export PYSPARK_SUBMIT_ARGS="--master local[2]"
fi

run

ipython profile create pyspark

created a startup file ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py configured in this way

# Configure the necessary Spark environment
import os
import sys

# Spark home
spark_home = os.environ.get("SPARK_HOME")

# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")

# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))

# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))

I then run ipython notebook --profile=pyspark and the notebook works fine, but the sc (spark context) is not recognised.

Anyone managed to do this with Spark 1.5.1?

EDIT: you can follow this guide to have it working

https://gist.github.com/tommycarpi/f5a67c66a8f2170e263c

Author:r4id4,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/33064031/link-spark-with-ipython-notebook
Alberto Bonsanto :

I have Jupyter installed, and indeed It is simpler than you think:\n\n\nInstall anaconda for OSX.\nInstall jupyter typing the next line in your terminal Click me for more info.\n\nilovejobs@mymac:~$ conda install jupyter\n\nUpdate jupyter just in case.\n\nilovejobs@mymac:~$ conda update jupyter\n\nDownload Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.\n\nilovejobs@mymac:~$ cd Downloads \nilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz\n\nCreate an Apps folder on your home (i.e):\n\nilovejobs@mymac:~/Downloads$ mkdir ~/Apps\n\nMove the uncompressed folder spark-1.5.1 to the ~/Apps directory. \n\nilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps\n\nMove to the ~/Apps directory and verify that spark is there.\n\nilovejobs@mymac:~/Downloads$ cd ~/Apps\nilovejobs@mymac:~/Apps$ ls -l\ndrwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1\n\nHere is the first tricky part. Add the spark binaries to your $PATH:\n\nilovejobs@mymac:~/Apps$ cd\nilovejobs@mymac:~$ echo \"export $HOME/apps/spark/bin:$PATH\" >> .profile\n\nHere is the second tricky part. Add this environment variables also:\n\nilovejobs@mymac:~$ echo \"export PYSPARK_DRIVER_PYTHON=ipython\" >> .profile\nilovejobs@mymac:~$ echo \"export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark\" >> .profile\n\nSource the profile to make these variables available for this terminal\n\nilovejobs@mymac:~$ source .profile\n\nCreate a ~/notebooks directory.\n\nilovejobs@mymac:~$ mkdir notebooks\n\nMove to ~/notebooks and run pyspark:\n\nilovejobs@mymac:~$ cd notebooks\nilovejobs@mymac:~/notebooks$ pyspark\n\n\n\nNotice that you can add those variables to the .bashrc located in your home.\nNow be happy, You should be able to run jupyter with a pyspark kernel (It will show it as a python 2 but it will use spark)",
2015-10-11T13:12:30
R.S L :

First, make sure you have got a spark enviornment in your machine.\n\nThen, install a python module findspark via pip:\n\n$ sudo pip install findspark\n\n\nAnd then in the python shell:\n\nimport findspark\nfindspark.init()\n\nimport pyspark\nsc = pyspark.SparkContext(appName=\"myAppName\")\n\n\nNow you can do what you want with pyspark in the python shell(or in ipython).\n\nActually it's the easiest way in my view to use spark kernel in the jupyter",
2017-03-20T03:47:06
Senkwich :

FYI, you can run Scala, PySpark, SparkR, and SQL with Spark running on top of Jupyter via https://github.com/ibm-et/spark-kernel now. The new interpreters were added (and marked experimental) from pull request https://github.com/ibm-et/spark-kernel/pull/146.\n\nSee the language support wiki page for more information.",
2015-10-20T03:57:52
yy