The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that
correctly fills the blanks in the code block to accomplish this.
transactionsDf.where(transactionsDf.storeId!=25)
Correct. DataFrame.where() is an alias for the DataFrame.filter() method. Using this method, it is straightforward to filter out rows that do not have value 25 in column storeId.
transactionsDf.select(transactionsDf.storeId!=25)
Wrong. The select operator allows you to build DataFrames column-wise, but when using it as shown, it does not filter out rows.
transactionsDf.filter(transactionsDf.storeId==25)
Incorrect. Although the filter expression works for filtering rows, the == in the filtering condition is inappropriate. It should be != instead.
transactionsDf.drop(transactionsDf.storeId==25)
No. DataFrame.drop() is used to remove specific columns, but not rows, from the DataFrame.
transactionsDf.remove(transactionsDf.storeId==25)
False. There is no DataFrame.remove() operator in PySpark.
More info: pyspark.sql.DataFrame.where --- PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3, Question: 48 (Databricks import instructions)
Currently there are no comments in this discussion, be the first to comment!