New Year Sale ! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 3 Question 70 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 70
Topic #: 3
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

Show Suggested Answer Hide Answer
Suggested Answer: A

transactionsDf.select('storeId').dropDuplicates().count()

Correct! After dropping all duplicates from column storeId, the remaining rows get counted, representing the number of unique values in the column.

transactionsDf.select(count('storeId')).dropDuplicates()

No. transactionsDf.select(count('storeId')) just returns a single-row DataFrame showing the number of non-null rows. dropDuplicates() does not have any effect in this context.

transactionsDf.dropDuplicates().agg(count('storeId'))

Incorrect. While transactionsDf.dropDuplicates() removes duplicate rows from transactionsDf, it does not do so taking only column storeId into consideration, but eliminates full row duplicates

instead.

transactionsDf.distinct().select('storeId').count()

Wrong. transactionsDf.distinct() identifies unique rows across all columns, but not only unique rows with respect to column storeId. This may leave duplicate values in the column, making the count

not represent the number of unique values in that column.

transactionsDf.select(distinct('storeId')).count()

False. There is no distinct method in pyspark.sql.functions.


Contribute your Thoughts:

Lashunda
1 months ago
I'm pretty sure the answer is option A. It's the most straightforward and logical way to get the unique count of 'storeId'. Although, I do wonder if anyone has ever tried to count the number of unique store IDs by actually visiting each one in person. That would be an interesting approach!
upvoted 0 times
Crissy
9 days ago
User1: I think option A is the correct answer.
upvoted 0 times
...
...
Arleen
2 months ago
I disagree, I believe the correct answer is C.
upvoted 0 times
...
Silva
2 months ago
This question is a real head-scratcher! I'm going to have to go with option D and hope for the best. I guess I should have paid more attention in class when they were talking about DataFrame functions.
upvoted 0 times
Tonja
12 days ago
User4: I agree with User3, option E seems right.
upvoted 0 times
...
Linette
14 days ago
User3: I'm going with option E.
upvoted 0 times
...
Catarina
27 days ago
User2: No, I believe it's option C.
upvoted 0 times
...
Gracia
29 days ago
User1: I think option A is the correct one.
upvoted 0 times
...
...
Chauncey
2 months ago
I think the answer is A.
upvoted 0 times
...
Cordelia
2 months ago
Hmm, I'm not sure about these options. Shouldn't we be using something like 'countDistinct()' instead of just 'count()'? I'm leaning towards option C.
upvoted 0 times
...
Zack
2 months ago
I think option E is the way to go. Distinct() will remove the duplicates, and then we can just count the number of remaining rows in the 'storeId' column.
upvoted 0 times
Taryn
29 days ago
I would go with option C. Using distinct() directly on 'storeId' and then counting the unique values seems like a straightforward approach.
upvoted 0 times
...
Taryn
1 months ago
I think option A is better because it directly selects the 'storeId' column, drops duplicates, and then counts the remaining rows.
upvoted 0 times
...
Taryn
1 months ago
I agree, option E seems like the most efficient way to get the count of unique values in the 'storeId' column.
upvoted 0 times
...
...
Ciara
2 months ago
Option A looks good to me. Selecting the 'storeId' column and then dropping duplicates should give us the count of unique store IDs.
upvoted 0 times
Laurena
1 months ago
User 3: Yeah, that seems like the right approach.
upvoted 0 times
...
Hermila
1 months ago
User 2: I agree, selecting 'storeId' and dropping duplicates should work.
upvoted 0 times
...
Annett
1 months ago
I think option A is the correct one.
upvoted 0 times
...
...

Save Cancel