Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 3 Question 17 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam

Question #: 17
Topic #: 3

[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

AtranactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

BtransactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

CtransactionsDf.select('value', 'productId').distinct()

DtransactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

EtransactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Show Suggested Answer

Suggested Answer: D

transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

Correct. This code block uses a common pattern for finding the unique values across multiple columns: union and distinct. In fact, it is so common that it is even mentioned in the Spark

documentation for the union command (link below).

transactionsDf.select('value', 'productId').distinct()

Wrong. This code block returns unique rows, but not unique values.

transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Incorrect. This code block will output a one-row, two-column DataFrame where each cell has an array of unique values in the respective column (even omitting any nulls).

transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

No. This command will count the number of rows, but will not return unique values.

transactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

Wrong. This command will perform an outer join of the value and productId columns. As such, it will return a two-column DataFrame. If you picked this answer, it might be a good idea for you to read

up on the difference between union and join, a link is posted below.

More info: pyspark.sql.DataFrame.union --- PySpark 3.1.2 documentation, sql - What is the difference between JOIN and UNION? - Stack Overflow

Static notebook | Dynamic notebook: See test 3, Question: 21 (Databricks import instructions)

by Kristel at May 07, 2022, 11:53 AM

Limited Time Offer

25%

Off

Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!