New Year Sale ! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 71 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 71
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

Show Suggested Answer Hide Answer
Suggested Answer: D

transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

Correct. This code block uses a common pattern for finding the unique values across multiple columns: union and distinct. In fact, it is so common that it is even mentioned in the Spark

documentation for the union command (link below).

transactionsDf.select('value', 'productId').distinct()

Wrong. This code block returns unique rows, but not unique values.

transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Incorrect. This code block will output a one-row, two-column DataFrame where each cell has an array of unique values in the respective column (even omitting any nulls).

transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

No. This command will count the number of rows, but will not return unique values.

transactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

Wrong. This command will perform an outer join of the value and productId columns. As such, it will return a two-column DataFrame. If you picked this answer, it might be a good idea for you to read

up on the difference between union and join, a link is posted below.

More info: pyspark.sql.DataFrame.union --- PySpark 3.1.2 documentation, sql - What is the difference between JOIN and UNION? - Stack Overflow

Static notebook | Dynamic notebook: See test 3, Question: 21 (Databricks import instructions)


Contribute your Thoughts:

Sabina
7 days ago
Why do you think D is the correct answer?
upvoted 0 times
...
Billy
10 days ago
I disagree, I believe the answer is D.
upvoted 0 times
...
Lavera
13 days ago
Option D? Really? Why would you ever want to do a union and then a distinct? Seems like a lot of unnecessary steps. I'm going with C, it's the clear winner here.
upvoted 0 times
Margurite
1 days ago
I think C is the best option here.
upvoted 0 times
...
...
Sabina
20 days ago
I think the answer is C.
upvoted 0 times
...
Gail
25 days ago
Hmm, I'm not sure. Option E looks like it could work, but I don't want to get caught up in all those fancy collect_set functions. Let's keep it simple!
upvoted 0 times
Alyce
8 days ago
User2: Yeah, that sounds simple and straightforward. Let's go with option C.
upvoted 0 times
...
Golda
12 days ago
User1: I think option C is the way to go. Just select the columns and call distinct.
upvoted 0 times
...
...
Danilo
1 months ago
Option C is the way to go! It's simple and straightforward, no need to get fancy with all that other stuff.
upvoted 0 times
Nydia
2 days ago
User3: Option C is definitely the way to go.
upvoted 0 times
...
Shay
15 days ago
User2: Yeah, I think option C is the most straightforward.
upvoted 0 times
...
Edna
20 days ago
User1: I agree, option C is the simplest choice.
upvoted 0 times
...
...

Save Cancel