Spark-1.0.0 setup for test a python program

At work, my company uses Storm and Esper for big data analyzing by Java code. I was triggered by a nice guy who reminded me of Spark (even I knew it for a while). The doc from Spark is not straight forward, let me wrote down the procedure that I had done on my Mac. 

1. Download and tar -zxvf  scala-2.10.3

2. In .bash_profile, set bin path to scala-2.10.3/bin

3. source .bash_profile

4. Down load prebuilt and tar -zxvf spark-1.0.0-hadoop2

5. in spark home, mkdir yao

6. in yao directory, write a python file ( , or copy from Spark website

from pyspark import SparkContext

logFile = “./” 

sc = SparkContext(“local”, “Simple App”)

logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: ‘a’ in s).count()

numBs = logData.filter(lambda s: ‘b’ in s).count()

print “Lines with a: %i, lines with b: %i” % (numAs, numBs)

7.  In the Spark home,  run the

./bin/spark-submit –master local[4] ./yao/

My next step is to enable PyCharm to work with the Spark and write a program to analyze TCP/UDP packet.  

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s