Which of the following code blocks displays various aggregated statistics of all columns in DataFrame transactionsDf, including the standard deviation and minimum of values in each column?
The DataFrame.summary() command is very practical for quickly calculating statistics of a DataFrame. You need to call .show() to display the results of the calculation. By default, the command
calculates various statistics (see documentation linked below), including standard deviation and minimum. Note that the answer that lists many options in the summary() parentheses does not
include the minimum, which is asked for in the question.
Answer options that include agg() do not work here as shown, since DataFrame.agg() expects more complex, column-specific instructions on how to aggregate values.
More info:
- pyspark.sql.DataFrame.summary --- PySpark 3.1.2 documentation
- pyspark.sql.DataFrame.agg --- PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3, Question: 46 (Databricks import instructions)
Currently there are no comments in this discussion, be the first to comment!