Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 3 Question 13 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 13
Topic #: 3

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from

DataFrame transactionsDf and column attributes from DataFrame itemsDf?

AtransactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

B1. transactionsDf.createOrReplaceTempView('transactionsDf')
2. itemsDf.createOrReplaceTempView('itemsDf')
3.
4. spark.sql('SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId').drop('attributes')

CtransactionsDf.drop('value', 'storeId').join(itemsDf.drop('attributes'), 'transactionsDf.productId==itemsDf.itemId')

D1. transactionsDf \
2. .drop(col('value'), col('storeId')) \
3. .join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

E1. transactionsDf.createOrReplaceTempView('transactionsDf')
2. itemsDf.createOrReplaceTempView('itemsDf')
3.
4. statement = '''
5. SELECT * FROM transactionsDf
6. INNER JOIN itemsDf
7. ON transactionsDf.productId==itemsDf.itemId
8. '''
9. spark.sql(statement).drop('value', 'storeId', 'attributes')

Show Suggested Answer

Suggested Answer: E

This Question: offers you a wide variety of answers for a seemingly simple question. However, this variety reflects the variety of ways that one can express a join in PySpark. You need to

understand

some SQL syntax to get to the correct answer here.

transactionsDf.createOrReplaceTempView('transactionsDf')

itemsDf.createOrReplaceTempView('itemsDf')

statement = '''

SELECT * FROM transactionsDf

INNER JOIN itemsDf

ON transactionsDf.productId==itemsDf.itemId

'''

spark.sql(statement).drop('value', 'storeId', 'attributes')

Correct - this answer uses SQL correctly to perform the inner join and afterwards drops the unwanted columns. This is totally fine. If you are unfamiliar with the triple-quote ''' in Python: This allows

you to express strings as multiple lines.

transactionsDf

.drop(col('value'), col('storeId'))

.join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

No, this answer option is a trap, since DataFrame.drop() does not accept a list of Column objects. You could use transactionsDf.drop('value', 'storeId') instead.

transactionsDf.drop('value', 'storeId').join(itemsDf.drop('attributes'), 'transactionsDf.productId==itemsDf.itemId')

Incorrect - Spark does not evaluate 'transactionsDf.productId==itemsDf.itemId' as a valid join expression. This would work if it would not be a string.

transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

Wrong, this statement incorrectly uses itemsDf.select instead of itemsDf.drop.

transactionsDf.createOrReplaceTempView('transactionsDf')

itemsDf.createOrReplaceTempView('itemsDf')

spark.sql('SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId').drop('attributes')

No, here the SQL expression syntax is incorrect. Simply specifying -columnName does not drop a column.

More info: pyspark.sql.DataFrame.join --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 25 (Databricks import instructions)

by Nobuko at May 04, 2022, 07:53 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!