Home:ALL Converter>is multi-stage workers allowed in Apache Spark?

is multi-stage workers allowed in Apache Spark?

Ask Time:2014-09-02T15:18:33         Author:Maulik

Json Formatter

I need to know that how does Spark allow communication between worker nodes ? All the tasks assigned to workers are from the master program, but can a worker's output be sent to another worker, so it can process the further steps on it ..

I am working on a case where there are multiple types of tasks to be carried out, suppose say tasks A,B,C. For task C to be started, task A and B should be completed, but A and B can be done independent of each other. So, i need few workers for task A, and few for B, and they must call workers of task C, without involving the master. Please provide me insights on how this can be achieved. Is this kind of a feature available in Yarn ?

Author:Maulik,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/25617860/is-multi-stage-workers-allowed-in-apache-spark
Mikel Urkia :

I am just throwing a possible solution, although I have not tested it myself and I am unsure of its success possibilities.\n\nWhat comes to my mind is creating a kind of barrier between the B and C tasks by making use of an action such as count. This will force Spark to complete all previous steps -in all nodes- before starting with stage C (I am not very sure of this statement). \n\nYou could then use the broadcast functionality to cache a variable and make it available for all executors without having to communicate with the master.",
2014-09-10T15:25:50
yy