Home:ALL Converter>Dividing tasks among Spark workers

Dividing tasks among Spark workers

Ask Time:2014-07-18T20:54:12         Author:Madhura Das

Json Formatter

I am running my program on a Spark cluster. But when I look at the UI while the job is running, I see that only one worker does most of the tasks. My cluster has one master and 4 workers where the master is also a worker.

I want my task to complete as quickly as possible and I believe that if the number of tasks were to be divided equally among the workers, the job will be completed faster.

Is there any way I can customize this?

System.setProperty("spark.default.parallelism","20")
val sc = new SparkContext("spark://10.100.15.2:7077","SimpleApp","/home/madhura/spark",List("hdfs://master:54310/simple-project_2.10-1.0.jar"))
val dRDD = sc.textFile("hdfs://master:54310/in*",10)
val keyval=dRDD.coalesce(100,true).mapPartitionsWithIndex{(ind,iter) => iter.map(x => process(ind,x.trim().split(' ').map(_.toDouble),q,m,r))}

I tried this but it did not help.

Author:Madhura Das,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/24825557/dividing-tasks-among-spark-workers
yy