Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 3 Question 11 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 11
Topic #: 3

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which

dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.

Sample of DataFrame transactionsDf:

1. +-------------+---------+-----+-------+---------+----+----------------+

3. +-------------+---------+-----+-------+---------+----+----------------+

4. | 1| 3| 4| 25| 1|null|2020-04-26 15:35|

5. | 2| 6| 7| 2| 2|null|2020-04-13 22:01|

6. | 3| 3| null| 25| 3|null|2020-04-02 10:53|

7. +-------------+---------+-----+-------+---------+----+----------------+

Code block:

1. transactionsDf = transactionsDf.drop("transactionDate")

2. transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM-dd")

AColumn transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used
instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().

BColumn transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column
transactionDate should be wrapped in a col() operator.

CColumn transactionDate should be wrapped in a col() operator.

DThe string indicating the date format should be adjusted. The withColumnReplaced operator should be used instead of the drop and assign pattern in the code block to replace column
transactionDate with the new column transactionTimestamp.

EColumn transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used
instead of the existing column assignment.

Show Suggested Answer

Suggested Answer: E

This Question: requires a lot of thinking to get right. For solving it, you may take advantage of the digital notepad that is provided to you during the test. You have probably seen that the code

block

includes multiple errors. In the test, you are usually confronted with a code block that only contains a single error. However, since you are practicing here, this challenging multi-error QUESTION

NO: will

make it easier for you to deal with single-error questions in the real exam.

You can clearly see that column transactionDate should be dropped only after transactionTimestamp has been written. This is because to generate column transactionTimestamp, Spark needs to

read the values from column transactionDate.

Values in column transactionDate in the original transactionsDf DataFrame look like 2020-04-26 15:35. So, to convert those correctly, you would have to pass yyyy-MM-dd HH:mm. In other words:

The string indicating the date format should be adjusted.

While you might be tempted to change unix_timestamp() to to_unixtime() (in line with the from_unixtime() operator), this function does not exist in Spark. unix_timestamp() is the correct operator to

use here.

Also, there is no DataFrame.withColumnReplaced() operator. A similar operator that exists is DataFrame.withColumnRenamed().

Whether you use col() or not is irrelevant with unix_timestamp() - the command is fine with both.

Finally, you cannot assign a column like transactionsDf['columnName'] = ... in Spark. This is Pandas syntax (Pandas is a popular Python package for data analysis), but it is not supported in Spark.

So, you need to use Spark's DataFrame.withColumn() syntax instead.

More info: pyspark.sql.functions.unix_timestamp --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 28 (Databricks import instructions)

by Donette at May 08, 2022, 02:36 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!