Home:ALL Converter>Performance bottleneck of Spark

Performance bottleneck of Spark

Ask Time:2015-05-15T16:14:01         Author:Xingjun Wang

Json Formatter

A paper "Making Sense of Performance in Data Analytics Frameworks" published in NSDI 2015 gives the conclusion that CPU(not IO or network) is the performance bottleneck of Spark. Kay has done some experiments on Spark including BDbench ,TPC-DS and a procdution workload(only Spark SQL is used?) in this paper. I wonder whether this conclusion is right for some frameworks built on Spark(like Streaming,with a continuous data stream received through network,both network IO and disk will suffer high pressure ).

Author:Xingjun Wang,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/30254668/performance-bottleneck-of-spark
Francois G :

Network and disk may suffer less pressure in Spark Streaming because the streams are usually checkpointed, meaning all data is not usually kept around forever.\n\nBut ultimately, this is a research question : the only way to settle this one is to benchmark. Kay's code is open-source.",
2015-05-15T09:19:01
yy