BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 3 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 3
Topic #: 2
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

Show Suggested Answer Hide Answer
Suggested Answer: B

transactionsDf.where(transactionsDf.storeId!=25)

Correct. DataFrame.where() is an alias for the DataFrame.filter() method. Using this method, it is straightforward to filter out rows that do not have value 25 in column storeId.

transactionsDf.select(transactionsDf.storeId!=25)

Wrong. The select operator allows you to build DataFrames column-wise, but when using it as shown, it does not filter out rows.

transactionsDf.filter(transactionsDf.storeId==25)

Incorrect. Although the filter expression works for filtering rows, the == in the filtering condition is inappropriate. It should be != instead.

transactionsDf.drop(transactionsDf.storeId==25)

No. DataFrame.drop() is used to remove specific columns, but not rows, from the DataFrame.

transactionsDf.remove(transactionsDf.storeId==25)

False. There is no DataFrame.remove() operator in PySpark.

More info: pyspark.sql.DataFrame.where --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 48 (Databricks import instructions)


Contribute your Thoughts:

Currently there are no comments in this discussion, be the first to comment!


Save Cancel