I have following code for Spark:
package my.spark;
import java.util.ArrayList;
import java.util.List;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;
public class ExecutionTest {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("ExecutionTest")
.getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
int slices = 2;
int n = slices;
List<String> list = new ArrayList<>(n);
for (int i = 0; i < n; i++) {
list.add("" + i);
}
JavaRDD<String> dataSet = jsc.parallelize(list, slices);
dataSet.foreach(str -> {
System.out.println("value: " + str);
Thread.sleep(10000);
});
System.out.println("done");
spark.stop();
}
}
I have run master node and two workers (everything on localhost; Windows) using the commands:
bin\spark-class org.apache.spark.deploy.master.Master
and (two times):
bin\spark-class org.apache.spark.deploy.worker.Worker spark://<local-ip>:7077
Everything started correctly.
After submitting my job using command:
bin\spark-submit --class my.spark.ExecutionTest --master spark://<local-ip>:7077 file:///<pathToFatJar>/FatJar.jar
Command started, but the value: 0
and value: 1
outputs are written by one of the workers (as displayed on Logs > stdout
on page associated with the worker). Second worker has nothing in Logs > stdout
. As far as I understood, this means, that each iteration is done by the same worker.
How to run these tasks on two different running workers?