BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 15 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 15
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

Show Suggested Answer Hide Answer
Suggested Answer: B

transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)

Correct. The repartition operator is the correct one for increasing the number of partitions. calling getNumPartitions() on DataFrame.rdd returns the current number of partitions.

transactionsDf.coalesce(10)

No, after this command transactionsDf will continue to only have 8 partitions. This is because coalesce() can only decreast the amount of partitions, but not increase it.

transactionsDf.repartition(transactionsDf.getNumPartitions()+2)

Incorrect, there is no getNumPartitions() method for the DataFrame class.

transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)

Wrong, coalesce() can only be used for reducing the number of partitions and there is no getNumPartitions() method for the DataFrame class.

transactionsDf.repartition(transactionsDf._partitions+2)

No, DataFrame has no _partitions attribute. You can find out the current number of partitions of a DataFrame with the DataFrame.rdd.getNumPartitions() method.

More info: pyspark.sql.DataFrame.repartition --- PySpark 3.1.2 documentation, pyspark.RDD.getNumPartitions --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 23 (Databricks import instructions)


Contribute your Thoughts:

Currently there are no comments in this discussion, be the first to comment!


Save Cancel