Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?
itemsDf.withColumnRenamed('attributes', 'feature0').withColumnRenamed('supplier', 'feature1')
Correct! Spark's DataFrame.withColumnRenamed syntax makes it relatively easy to change the name of a column.
itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)
Incorrect. In this code block, the Python interpreter will try to use attributes and the other column names as variables. Needless to say, they are undefined, and as a result the block will not run.
itemsDf.withColumnRenamed(col('attributes'), col('feature0'), col('supplier'), col('feature1'))
Wrong. The DataFrame.withColumnRenamed() operator takes exactly two string arguments. So, in this answer both using col() and using four arguments is wrong.
itemsDf.withColumnRenamed('attributes', 'feature0')
itemsDf.withColumnRenamed('supplier', 'feature1')
No. In this answer, the returned DataFrame will only have column supplier be renamed, since the result of the first line is not written back to itemsDf.
itemsDf.withColumn('attributes', 'feature0').withColumn('supplier', 'feature1')
Incorrect. While withColumn works for adding and naming new columns, you cannot use it to rename existing columns.
More info: pyspark.sql.DataFrame.withColumnRenamed --- PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3, Question: 29 (Databricks import instructions)
Currently there are no comments in this discussion, be the first to comment!