Spark and IPython on CentOS 7

Ask Time：2016-09-01T06:07:32 Author：azdatasci

I am experimenting with Hadoop and Spark, as the company I work for is getting ready to start spinning up Hadoop and want to use Spark and other resources to do a lot of machine learning on our data.
Most of that falls to me, so I am preparing by learning on my own.

I have a machine I have setup as a single node Hadoop cluster.
Here is what I have:

CentOS 7 (minimal server install, added XOrg and OpenBox for GUI)
Python 2.7
Hadoop 2.7.2
Spark 2.0.0

I followed these guides to set this up:

When I attempt to run 'pyspark' I get the following:

IPYTHON and IPYTHON_OPTS are removed in Spark 2.0+. Remove these from the environment and set PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYHTON_OPTS instead.

I opened up the pyspark file in vi and examined it.
I see a lot of stuff going on there, but I don't know where to start to make the corrections I need to make.
My Spark installation is under:

/opt/spark-latest

The pyspark is under /opt/spark-latest/bin/ and my Hadoop installation (though I don't think this factors in) is /opt/hadoop/.
I know there must be a change I need to make in the pyspark file somewhere, I just don't know where to being on this.
I did some googling and found references to similar things, but nothing that indicated steps in order to fix this.

Can anyone give me a push in the right direction?

Author:azdatasci，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/39259741/spark-and-ipython-on-centos-7

Spark and IPython on CentOS 7