I am running my program on a Spark cluster. But when I look at the UI while the job is running, I see that only one worker does most of the tasks. My cluster has one master and 4 workers where the master is also a worker.
I want my task to complete as quickly as possible and I believe that if the number of tasks were to be divided equally among the workers, the job will be completed faster.
Is there any way I can customize this?
System.setProperty("spark.default.parallelism","20")
val sc = new SparkContext("spark://10.100.15.2:7077","SimpleApp","/home/madhura/spark",List("hdfs://master:54310/simple-project_2.10-1.0.jar"))
val dRDD = sc.textFile("hdfs://master:54310/in*",10)
val keyval=dRDD.coalesce(100,true).mapPartitionsWithIndex{(ind,iter) => iter.map(x => process(ind,x.trim().split(' ').map(_.toDouble),q,m,r))}
I tried this but it did not help.