Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 1 Question 49 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 49
Topic #: 1

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block displayed below contains an error. The code block should write DataFrame transactionsDf as a parquet file to location filePath after partitioning it on column storeId. Find the error.

Code block:

transactionsDf.write.partitionOn("storeId").parquet(filePath)

AThe partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

BThe partitionOn method should be called before the write method.

CThe operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

DColumn storeId should be wrapped in a col() operator.

ENo method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Show Suggested Answer

Suggested Answer: E

No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Correct! Find out more about partitionBy() in the documentation (linked below).

The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

No. There is no information about whether files should be overwritten in the question.

The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

Incorrect. To write a DataFrame to disk, you need to work with a DataFrameWriter object which you get access to through the DataFrame.writer property - no parentheses involved.

Column storeId should be wrapped in a col() operator.

No, this is not necessary - the problem is in the partitionOn command (see above).

The partitionOn method should be called before the write method.

Wrong. First of all partitionOn is not a valid method of DataFrame. However, even assuming partitionOn would be replaced by partitionBy (which is a valid method), this method is a method of

DataFrameWriter and not of DataFrame. So, you would always have to first call DataFrame.write to get access to the DataFrameWriter object and afterwards call partitionBy.

More info: pyspark.sql.DataFrameWriter.partitionBy --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 33 (Databricks import instructions)

by Eden at Feb 07, 2024, 08:51 PM

Limited Time Offer

25%

1 years ago

I think the error is that the partitionOn method should be called before the write method.

upvoted 0 times

...