A join between one large dataset and a smaller dataset can be done by broadcasting the smaller dataset to all executors where a partition from the left dataset exists. The following is an illustration of how a broadcast join works internally:
A join between one large dataset and a smaller dataset can be done by broadcasting the smaller dataset to all executors where a partition from the left dataset exists. The following is an illustration of how a broadcast join works internally: