Which of the following code blocks returns a one-column DataFrame for which every row contains an array of all integer numbers from 0 up to and including the number given in column predError of
DataFrame transactionsDf, and null if predError is null?
Sample of DataFrame transactionsDf:
1. +-------------+---------+-----+-------+---------+----+
2. |transactionId|predError|value|storeId|productId| f|
3. +-------------+---------+-----+-------+---------+----+
4. | 1| 3| 4| 25| 1|null|
5. | 2| 6| 7| 2| 2|null|
6. | 3| 3| null| 25| 3|null|
7. | 4| null| null| 3| 2|null|
8. | 5| null| null| null| 2|null|
9. | 6| 3| 2| 25| 2|null|
10. +-------------+---------+-----+-------+---------+----+
Correct code block:
def count_to_target(target):
if target is None:
return
result = list(range(target))
return result
count_to_target_udf = udf(count_to_target, ArrayType(IntegerType()))
transactionsDf.select(count_to_target_udf('predError'))
Output of correct code block:
+--------------------------+
|count_to_target(predError)|
+--------------------------+
| [0, 1, 2]|
| [0, 1, 2, 3, 4, 5]|
| [0, 1, 2]|
| null|
| null|
| [0, 1, 2]|
+--------------------------+
This Question: is not exactly easy. You need to be familiar with the syntax around UDFs (user-defined functions). Specifically, in this Question: it is important to pass the correct types
to the udf
method - returning an array of a specific type rather than just a single type means you need to think harder about type implications than usual.
Remember that in Spark, you always pass types in an instantiated way like ArrayType(IntegerType()), not like ArrayType(IntegerType). The parentheses () are the key here - make sure you do not
forget those.
You should also pay attention that you actually pass the UDF count_to_target_udf, and not the Python method count_to_target to the select() operator.
Finally, null values are always a tricky case with UDFs. So, take care that the code can handle them correctly.
More info: How to Turn Python Functions into PySpark Functions (UDF) -- Chang Hsin Lee -- Committing my thoughts to words.
Static notebook | Dynamic notebook: See test 3, Question: 24 (Databricks import instructions)
Currently there are no comments in this discussion, be the first to comment!