BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 3 Question 69 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 69
Topic #: 3
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which

dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.

Sample of DataFrame transactionsDf:

1. +-------------+---------+-----+-------+---------+----+----------------+

2. |transactionId|predError|value|storeId|productId| f| transactionDate|

3. +-------------+---------+-----+-------+---------+----+----------------+

4. | 1| 3| 4| 25| 1|null|2020-04-26 15:35|

5. | 2| 6| 7| 2| 2|null|2020-04-13 22:01|

6. | 3| 3| null| 25| 3|null|2020-04-02 10:53|

7. +-------------+---------+-----+-------+---------+----+----------------+

Code block:

1. transactionsDf = transactionsDf.drop("transactionDate")

2. transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM-dd")

Show Suggested Answer Hide Answer
Suggested Answer: E

This Question: requires a lot of thinking to get right. For solving it, you may take advantage of the digital notepad that is provided to you during the test. You have probably seen that the code

block

includes multiple errors. In the test, you are usually confronted with a code block that only contains a single error. However, since you are practicing here, this challenging multi-error QUESTION

NO: will

make it easier for you to deal with single-error questions in the real exam.

You can clearly see that column transactionDate should be dropped only after transactionTimestamp has been written. This is because to generate column transactionTimestamp, Spark needs to

read the values from column transactionDate.

Values in column transactionDate in the original transactionsDf DataFrame look like 2020-04-26 15:35. So, to convert those correctly, you would have to pass yyyy-MM-dd HH:mm. In other words:

The string indicating the date format should be adjusted.

While you might be tempted to change unix_timestamp() to to_unixtime() (in line with the from_unixtime() operator), this function does not exist in Spark. unix_timestamp() is the correct operator to

use here.

Also, there is no DataFrame.withColumnReplaced() operator. A similar operator that exists is DataFrame.withColumnRenamed().

Whether you use col() or not is irrelevant with unix_timestamp() - the command is fine with both.

Finally, you cannot assign a column like transactionsDf['columnName'] = ... in Spark. This is Pandas syntax (Pandas is a popular Python package for data analysis), but it is not supported in Spark.

So, you need to use Spark's DataFrame.withColumn() syntax instead.

More info: pyspark.sql.functions.unix_timestamp --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 28 (Databricks import instructions)


Contribute your Thoughts:

Grover
16 days ago
I'm with Malcolm on this one. Option E seems to address all the issues - dropping the column after the conversion and adjusting the date format. Gotta love these coding challenges, they really keep you on your toes!
upvoted 0 times
Xochitl
5 days ago
I think Option E is the way to go. It covers all the necessary changes.
upvoted 0 times
...
...
Jacob
17 days ago
Haha, looks like the developers were a bit too eager to drop that column! They should have waited until after the conversion to unix timestamp. Option E is the way to go.
upvoted 0 times
...
Malcolm
18 days ago
I think option E is the correct answer. The code block is dropping the transactionDate column before converting it to a timestamp, which is incorrect. The string format should also be adjusted to match the format in the sample DataFrame.
upvoted 0 times
...
Celestina
19 days ago
Haha, I bet the person who wrote this code was trying to be a little too clever. Dropping the column before creating the new one? That's like trying to change a tire while the car's still moving!
upvoted 0 times
...
Mitsue
22 days ago
This looks like a tricky one! The date format string definitely needs to be adjusted, and the withColumnReplaced operator would be a better choice than the drop and assign pattern.
upvoted 0 times
...
Rosann
23 days ago
The code block is almost there, but the column transactionDate should be wrapped in a col() operator, and the withColumn operator should be used instead of the drop and assign pattern.
upvoted 0 times
Santos
5 days ago
A
upvoted 0 times
...
...
Jamey
2 months ago
The code block has a few issues. The date format string should be adjusted, and the withColumn operator should be used instead of the existing column assignment. Also, the to_unixtime() function should be used instead of unix_timestamp().
upvoted 0 times
Pearlie
16 days ago
E) Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment.
upvoted 0 times
...
Rusty
23 days ago
B) Column transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column transactionDate should be wrapped in a col() operator.
upvoted 0 times
...
Ettie
27 days ago
A) Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().
upvoted 0 times
...
...
Louisa
2 months ago
But A mentions using withColumn operator and to_unixtime() function, which seems more appropriate for this task.
upvoted 0 times
...
Ressie
2 months ago
I disagree, I believe the correct answer is B.
upvoted 0 times
...
Louisa
2 months ago
I think the correct answer is A.
upvoted 0 times
...

Save Cancel