Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 71 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 71
Topic #: 2

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

AtranactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

BtransactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

CtransactionsDf.select('value', 'productId').distinct()

DtransactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

EtransactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Show Suggested Answer

Suggested Answer: D

transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

Correct. This code block uses a common pattern for finding the unique values across multiple columns: union and distinct. In fact, it is so common that it is even mentioned in the Spark

documentation for the union command (link below).

transactionsDf.select('value', 'productId').distinct()

Wrong. This code block returns unique rows, but not unique values.

transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Incorrect. This code block will output a one-row, two-column DataFrame where each cell has an array of unique values in the respective column (even omitting any nulls).

transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

No. This command will count the number of rows, but will not return unique values.

transactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

Wrong. This command will perform an outer join of the value and productId columns. As such, it will return a two-column DataFrame. If you picked this answer, it might be a good idea for you to read

up on the difference between union and join, a link is posted below.

More info: pyspark.sql.DataFrame.union --- PySpark 3.1.2 documentation, sql - What is the difference between JOIN and UNION? - Stack Overflow

Static notebook | Dynamic notebook: See test 3, Question: 21 (Databricks import instructions)

by Chandra at Nov 17, 2024, 06:21 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Valentin

6 months ago

Option C all the way! It's like the Goldilocks of solutions - not too hot, not too cold, just right. Plus, it's probably the only one that won't make the grader fall asleep while reading it.

upvoted 0 times

...

7 months ago

Option C is the way to go! It's simple and straightforward, no need to get fancy with all that other stuff.

upvoted 0 times

Glory

6 months ago

User4: I also think option C is the best solution.

upvoted 0 times

...

Nydia

6 months ago

User3: Option C is definitely the way to go.

upvoted 0 times

...

Shay

7 months ago

User2: Yeah, I think option C is the most straightforward.

upvoted 0 times

...

Edna

7 months ago

User1: I agree, option C is the simplest choice.

upvoted 0 times

...