BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 14 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 14
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks returns a one-column DataFrame for which every row contains an array of all integer numbers from 0 up to and including the number given in column predError of

DataFrame transactionsDf, and null if predError is null?

Sample of DataFrame transactionsDf:

1. +-------------+---------+-----+-------+---------+----+

2. |transactionId|predError|value|storeId|productId| f|

3. +-------------+---------+-----+-------+---------+----+

4. | 1| 3| 4| 25| 1|null|

5. | 2| 6| 7| 2| 2|null|

6. | 3| 3| null| 25| 3|null|

7. | 4| null| null| 3| 2|null|

8. | 5| null| null| null| 2|null|

9. | 6| 3| 2| 25| 2|null|

10. +-------------+---------+-----+-------+---------+----+

Show Suggested Answer Hide Answer
Suggested Answer: C

Correct code block:

def count_to_target(target):

if target is None:

return

result = list(range(target))

return result

count_to_target_udf = udf(count_to_target, ArrayType(IntegerType()))

transactionsDf.select(count_to_target_udf('predError'))

Output of correct code block:

+--------------------------+

|count_to_target(predError)|

+--------------------------+

| [0, 1, 2]|

| [0, 1, 2, 3, 4, 5]|

| [0, 1, 2]|

| null|

| null|

| [0, 1, 2]|

+--------------------------+

This Question: is not exactly easy. You need to be familiar with the syntax around UDFs (user-defined functions). Specifically, in this Question: it is important to pass the correct types

to the udf

method - returning an array of a specific type rather than just a single type means you need to think harder about type implications than usual.

Remember that in Spark, you always pass types in an instantiated way like ArrayType(IntegerType()), not like ArrayType(IntegerType). The parentheses () are the key here - make sure you do not

forget those.

You should also pay attention that you actually pass the UDF count_to_target_udf, and not the Python method count_to_target to the select() operator.

Finally, null values are always a tricky case with UDFs. So, take care that the code can handle them correctly.

More info: How to Turn Python Functions into PySpark Functions (UDF) -- Chang Hsin Lee -- Committing my thoughts to words.

Static notebook | Dynamic notebook: See test 3, Question: 24 (Databricks import instructions)


Contribute your Thoughts:

Currently there are no comments in this discussion, be the first to comment!


Save Cancel