How to run tasks in Spark on different workers?

Ask Time：2017-09-04T14:40:45 Author：faramir

I have following code for Spark:

package my.spark;

import java.util.ArrayList;
import java.util.List;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;

public class ExecutionTest {    
    public static void main(String[] args) {
        SparkSession spark = SparkSession
                .builder()
                .appName("ExecutionTest")
                .getOrCreate();

        JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());

        int slices = 2;
        int n = slices;
        List<String> list = new ArrayList<>(n);
        for (int i = 0; i < n; i++) {
            list.add("" + i);
        }

        JavaRDD<String> dataSet = jsc.parallelize(list, slices);

        dataSet.foreach(str -> {
            System.out.println("value: " + str);
            Thread.sleep(10000);
        });

        System.out.println("done");

        spark.stop();
    }

}

I have run master node and two workers (everything on localhost; Windows) using the commands:

bin\spark-class org.apache.spark.deploy.master.Master

and (two times):

bin\spark-class org.apache.spark.deploy.worker.Worker spark://<local-ip>:7077

Everything started correctly.

After submitting my job using command:

bin\spark-submit --class my.spark.ExecutionTest --master spark://<local-ip>:7077 file:///<pathToFatJar>/FatJar.jar

Command started, but the value: 0 and value: 1 outputs are written by one of the workers (as displayed on Logs > stdout on page associated with the worker). Second worker has nothing in Logs > stdout. As far as I understood, this means, that each iteration is done by the same worker.

How to run these tasks on two different running workers?

Author:faramir，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/46031549/how-to-run-tasks-in-spark-on-different-workers

How to run tasks in Spark on different workers?