Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 14 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 14
Topic #: 2

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks returns a one-column DataFrame for which every row contains an array of all integer numbers from 0 up to and including the number given in column predError of

DataFrame transactionsDf, and null if predError is null?

Sample of DataFrame transactionsDf:

1. +-------------+---------+-----+-------+---------+----+

3. +-------------+---------+-----+-------+---------+----+

4. | 1| 3| 4| 25| 1|null|

5. | 2| 6| 7| 2| 2|null|

6. | 3| 3| null| 25| 3|null|

7. | 4| null| null| 3| 2|null|

8. | 5| null| null| null| 2|null|

9. | 6| 3| 2| 25| 2|null|

10. +-------------+---------+-----+-------+---------+----+

A1. def count_to_target(target):
2. if target is None:
3. return
4.
5. result = [range(target)]
6. return result
7.
8. count_to_target_udf = udf(count_to_target, ArrayType[IntegerType])
9.
10. transactionsDf.select(count_to_target_udf(col('predError')))

B1. def count_to_target(target):
2. if target is None:
3. return
4.
5. result = list(range(target))
6. return result
7.
8. transactionsDf.select(count_to_target(col('predError')))

C1. def count_to_target(target):
2. if target is None:
3. return
4.
5. result = list(range(target))
6. return result
7.
8. count_to_target_udf = udf(count_to_target, ArrayType(IntegerType()))
9.
10. transactionsDf.select(count_to_target_udf('predError'))
(Correct)

D1. def count_to_target(target):
2. result = list(range(target))
3. return result
4.
5. count_to_target_udf = udf(count_to_target, ArrayType(IntegerType()))
6.
7. df = transactionsDf.select(count_to_target_udf('predError'))

E1. def count_to_target(target):
2. if target is None:
3. return
4.
5. result = list(range(target))
6. return result
7.
8. count_to_target_udf = udf(count_to_target)
9.
10. transactionsDf.select(count_to_target_udf('predError'))

Show Suggested Answer

Suggested Answer: C

Correct code block:

def count_to_target(target):

if target is None:

return

result = list(range(target))

return result

count_to_target_udf = udf(count_to_target, ArrayType(IntegerType()))

transactionsDf.select(count_to_target_udf('predError'))

Output of correct code block:

+--------------------------+

|count_to_target(predError)|

+--------------------------+

| [0, 1, 2]|

| [0, 1, 2, 3, 4, 5]|

| [0, 1, 2]|

| null|

| [0, 1, 2]|

+--------------------------+

This Question: is not exactly easy. You need to be familiar with the syntax around UDFs (user-defined functions). Specifically, in this Question: it is important to pass the correct types

to the udf

method - returning an array of a specific type rather than just a single type means you need to think harder about type implications than usual.

Remember that in Spark, you always pass types in an instantiated way like ArrayType(IntegerType()), not like ArrayType(IntegerType). The parentheses () are the key here - make sure you do not

forget those.

You should also pay attention that you actually pass the UDF count_to_target_udf, and not the Python method count_to_target to the select() operator.

Finally, null values are always a tricky case with UDFs. So, take care that the code can handle them correctly.

More info: How to Turn Python Functions into PySpark Functions (UDF) -- Chang Hsin Lee -- Committing my thoughts to words.

Static notebook | Dynamic notebook: See test 3, Question: 24 (Databricks import instructions)

by Beckie at May 09, 2022, 08:50 PM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!