Adjust tasks distribution among workers in Spark Streaming

Ask Time：2015-03-31T01:18:36 Author：luke

I am starting developing my first Spark Streaming cluster, and I wonder whether it considers some sort of 'manual partitioning' of tasks among the workers.

As far as I understood --please correct me if I'm wrong--, an RDD is split into partitions (tasks) each one going to a different worker, basically in a fair mode (by the way, is that even tunable?).

Now, in other SPSs (like Apache Storm) there's something called fields grouping, which partitions the stream accordingly to a specific field (i.e., key) so that equal key implies same task which implies same operator.

Is something similar possible in Spark Streaming (i.e., partitioning an RDD among workers according to something close to tuples keys)? I am asking since I could use such an approach, but I I have some doubts about that being consistent with Spark's philosophy itself.

Any hint or clarification will be welcome! :-)

Have a nice day!

EDIT: has the method updateStateByKey() anything to do with this? Something like updateStateByKey(updateFunc, new HashPartitioner(...)) ?

Author:luke，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/29352313/adjust-tasks-distribution-among-workers-in-spark-streaming

Adjust tasks distribution among workers in Spark Streaming