Home:ALL Converter>Should threads act on separate memory?

Should threads act on separate memory?

Ask Time:2018-07-06T17:00:51         Author:

Json Formatter

I have a C++ program whose task is to analyse a stream of binary data (typically it's a file on disk) and extract some information. This task is "memory-less", meaning that the outcome of each step is independent of the previous one. Because of this, I thought to speed it up by giving the data to separate threads in order to improve performance.

For now, the data is read in blocks of 1GB at a time and saved in an array to avoid I/O bottlenecks. Should I separate the data in n chunks/arrays (where n is the number of threads) or is a single array accessed by multiple threads not an issue?

I have a C++ program whose task is to analyse a stream of binary data (typically it's a file on disk) and extract some information. This task is "memory-less", meaning that the outcome of each step is independent of the previous one. Because of this, I thought to speed it up by giving the data to separate threads in order to improve performance.

For now, the data is read in blocks of 1GB at a time and saved in an array to avoid I/O bottlenecks. Should I separate the data in n chunks/arrays (where n is the number of threads) or is a single array accessed by multiple threads not an issue?

EDIT 1: data and anlaysis specification I realize that the wording of the problem might be too broad, as pointed out by one of the comments. I will try to go into a bit more detail.

The data being analysed is a series of unsigned 64bits integers generated by a so-called "time-to-digital" converter (TDC), storing a timestamp information about some event they register. My TDC has multiple channels, so each timestamp has information about which channel triggered (first 3 bits), whether that was a rising or falling edge trigger (4th bit), and the actual time (in clock ticks since powering up the TDC, last 60 bits).

The timestamps, of course, are saved in the file chronologically. The task is finding coincidence events between channels within a certain time window which the user sets. So you keep reading the timestamps and when you find two in the channels of interest whose distance in time is less than the set one, you increase the number of coincidence events.

These files can be quite big (tens of GB) and the number of timestamps enormous (one clock tick is 80 picoseconds).

For now I only go through the whole file once, and the idea was to "cut it" in smaller pieces that would then be analysed by different threads. The possible loss of events between the cuts is acceptable for me, since, at most will be 2 over hundreds of thousands.

Of course, they would only read data from the file/memory. I can write the coincidence counts in three separate variables and then sum them when all threads finish, if this helps avoiding sync problems.

I hope now things are clearer.

Author:,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/51206669/should-threads-act-on-separate-memory
yy