Which of the elements in the labeled panels represent the operation performed for broadcast variables?
Larger image
2,3
Correct! Both panels 2 and 3 represent the operation performed for broadcast variables. While a broadcast operation may look like panel 3, with the driver being the bottleneck, it most probably
looks like panel 2.
This is because the torrent protocol sits behind Spark's broadcast implementation. In the torrent protocol, each executor will try to fetch missing broadcast variables from the driver or other nodes,
preventing the driver from being the bottleneck.
1,2
Wrong. While panel 2 may represent broadcasting, panel 1 shows bi-directional communication which does not occur in broadcast operations.
3
No. While broadcasting may materialize like shown in panel 3, its use of the torrent protocol also enables communciation as shown in panel 2 (see first explanation).
1,3,4
No. While panel 2 shows broadcasting, panel 1 shows bi-directional communication -- not a characteristic of broadcasting. Panel 4 shows uni-directional communication, but in the wrong direction.
Panel 4 resembles more an accumulator variable than a broadcast variable.
2,5
Incorrect. While panel 2 shows broadcasting, panel 5 includes bi-directional communication -- not a characteristic of broadcasting.
More info: Broadcast Join with Spark -- henning.kropponline.de
Currently there are no comments in this discussion, be the first to comment!