BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 68 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 68
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

The code block displayed below contains an error. The code block should write DataFrame transactionsDf as a parquet file to location filePath after partitioning it on column storeId. Find the error.

Code block:

transactionsDf.write.partitionOn("storeId").parquet(filePath)

Show Suggested Answer Hide Answer
Suggested Answer: E

No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Correct! Find out more about partitionBy() in the documentation (linked below).

The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

No. There is no information about whether files should be overwritten in the question.

The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

Incorrect. To write a DataFrame to disk, you need to work with a DataFrameWriter object which you get access to through the DataFrame.writer property - no parentheses involved.

Column storeId should be wrapped in a col() operator.

No, this is not necessary - the problem is in the partitionOn command (see above).

The partitionOn method should be called before the write method.

Wrong. First of all partitionOn is not a valid method of DataFrame. However, even assuming partitionOn would be replaced by partitionBy (which is a valid method), this method is a method of

DataFrameWriter and not of DataFrame. So, you would always have to first call DataFrame.write to get access to the DataFrameWriter object and afterwards call partitionBy.

More info: pyspark.sql.DataFrameWriter.partitionBy --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 33 (Databricks import instructions)


Contribute your Thoughts:

Galen
21 days ago
Haha, the real error is that the developer forgot to add a 'save' command at the end. How else will the data be written to the file?
upvoted 0 times
...
Ivan
22 days ago
I think the error is in the partitioning column. It should be wrapped in a col() operator, so the correct answer is D.
upvoted 0 times
Kristel
4 days ago
A
upvoted 0 times
...
...
Darnell
24 days ago
The code block looks good to me. The partitionOn() method should work just fine.
upvoted 0 times
...
Rhea
25 days ago
Haha, I bet the person who wrote this code was a real parquet-y fellow! Get it? Parquet? Ah, never mind.
upvoted 0 times
...
Twila
26 days ago
This is a tricky one, but I'd say option A is the right answer. The partitioning column and file path should be passed directly to the write() method.
upvoted 0 times
Evangelina
1 days ago
Option A makes sense. It's important to pass the partitioning column and file path directly to the write() method.
upvoted 0 times
...
Carmela
4 days ago
Yes, I agree. The code block should be modified to pass the partitioning column and file path directly to the write() method.
upvoted 0 times
...
Lai
7 days ago
I think option A is correct. The partitioning column and file path should be passed directly to the write() method.
upvoted 0 times
...
...
Caprice
1 months ago
I think option D is the way to go. The column storeId should be wrapped in a col() operator.
upvoted 0 times
Cristal
15 days ago
Thanks for the clarification. I'll make sure to use the col() operator for column storeId.
upvoted 0 times
...
Tracie
18 days ago
Yes, that's correct. Option D is the right choice.
upvoted 0 times
...
Johnson
25 days ago
I think option D is the way to go. The column storeId should be wrapped in a col() operator.
upvoted 0 times
...
...
Felice
1 months ago
Option E is correct. The partitionOn method does not exist, and partitionBy should be used instead.
upvoted 0 times
Terrilyn
15 days ago
E) No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.
upvoted 0 times
...
Rodrigo
18 days ago
B) The partitionOn method should be called before the write method.
upvoted 0 times
...
Alline
24 days ago
A) The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.
upvoted 0 times
...
...
Teddy
2 months ago
The error is in option B. The partitionOn method should be called before the write method.
upvoted 0 times
Corazon
1 months ago
C) The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.
upvoted 0 times
...
Shaniqua
1 months ago
B) The partitionOn method should be called before the write method.
upvoted 0 times
...
Delbert
1 months ago
A) The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.
upvoted 0 times
...
...
Albert
2 months ago
I think option C is also important, using mode() to configure the DataFrameWriter.
upvoted 0 times
...
Rene
2 months ago
I agree with Georgiana. Also, the partitioning column and file path should be passed directly to the write() method.
upvoted 0 times
...
Georgiana
2 months ago
I think the error is that partitionOn() should be replaced with partitionBy().
upvoted 0 times
...

Save Cancel