Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 2 Question 46 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 46
Topic #: 2

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?

AtransactionsDf['storeId'].distinct()

BtransactionsDf.select('storeId').distinct()
(Correct)

CtransactionsDf.filter('storeId').distinct()

DtransactionsDf.select(col('storeId').distinct())

EtransactionsDf.distinct('storeId')

Show Suggested Answer

Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.

by Blair at Dec 14, 2023, 05:43 PM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Argelia

2 months ago

Come on, this is a piece of cake! B is the only option that makes sense. The other choices are about as useful as a chocolate teapot. *chuckles*

upvoted 0 times

Marleen

20 days ago

User1: It's definitely B, the rest are like a chocolate teapot.

upvoted 0 times

...

Amber

20 days ago

User 3: Yeah, the other choices are not relevant.

upvoted 0 times

...

Myong

22 days ago

User 2: Agreed, B is the only option that makes sense.

upvoted 0 times

...

Elin

2 months ago

User3: Agreed, the other options are pretty useless.

upvoted 0 times

...

Beckie

2 months ago

User2: Yeah, B is the only one that makes sense.

upvoted 0 times

...

Gracia

2 months ago

User 1: I think B is the correct answer.

upvoted 0 times

...

Estrella

2 months ago

User1: I think B is the correct answer.

upvoted 0 times

...

Reuben

2 months ago

I was torn between B and D, but I think B is the better option. Wouldn't want to accidentally include any duplicates, you know? Wait, is that a spider on the ceiling? *screams*

upvoted 0 times

...