Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 66 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam
Question #: 66
Topic #: 2
[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

The code block shown below should add column transactionDateForm to DataFrame transactionsDf. The column should express the unix-format timestamps in column transactionDate as string

type like Apr 26 (Sunday). Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, from_unixtime(__3__, __4__))

Show Suggested Answer Hide Answer
Suggested Answer: C

Correct code block:

transactionsDf.withColumn('transactionDateForm', from_unixtime('transactionDate', 'MMM d (EEEE)'))

The Question: specifically asks about 'adding' a column. In the context of all presented answers, DataFrame.withColumn() is the correct command for this. In theory, DataFrame.select() could

also be

used for this purpose, if all existing columns are selected and a new one is added. DataFrame.withColumnRenamed() is not the appropriate command, since it can only rename existing columns, but

cannot add a new column or change the value of a column.

Once DataFrame.withColumn() is chosen, you can read in the documentation (see below) that the first input argument to the method should be the column name of the new column.

The final difficulty is the date format. The Question: indicates that the date format Apr 26 (Sunday) is desired. The answers give 'MMM d (EEEE)' and 'MM d (EEE)' as options. It can be hard

to

know the details of the date format that is used in Spark. Specifically, knowing the differences between MMM and MM is probably not something you deal with every day. But, there is an easy way

to remember the difference: M (one letter) is usually the shortest form: 4 for April. MM includes padding: 04 for April. MMM (three letters) is the three-letter month abbreviation: Apr for April. And

MMMM is the longest possible form: April. Knowing this four-letter sequence helps you select the correct option here.

More info: pyspark.sql.DataFrame.withColumn --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 35 (Databricks import instructions)


Contribute your Thoughts:

Rolland
1 days ago
I think option D is the correct answer. The date format 'MM d (EEE)' seems more appropriate for the requirements.
upvoted 0 times
...
Salena
11 days ago
I agree with Margart. Option A seems to be the most logical choice for adding the new column.
upvoted 0 times
...
Ciara
11 days ago
I'm not sure, but I think option C could also work. It's a tough choice between A and C.
upvoted 0 times
...
Margart
19 days ago
I think the correct answer is A because we need to add a new column with the specified format.
upvoted 0 times
...
Carmelina
19 days ago
Option A looks good, but I'm not sure about the date format. Shouldn't it be 'MMM d (EEE)' instead of 'MMM d (EEEE)'?
upvoted 0 times
...

Save Cancel