BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 1 Question 25 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 25
Topic #: 1
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks creates a new 6-column DataFrame by appending the rows of the 6-column DataFrame yesterdayTransactionsDf to the rows of the 6-column DataFrame

todayTransactionsDf, ignoring that both DataFrames have different column names?

Show Suggested Answer Hide Answer
Suggested Answer: E

todayTransactionsDf.union(yesterdayTransactionsDf)

Correct. The union command appends rows of yesterdayTransactionsDf to the rows of todayTransactionsDf, ignoring that both DataFrames have different column names. The resulting DataFrame

will have the column names of DataFrame todayTransactionsDf.

todayTransactionsDf.unionByName(yesterdayTransactionsDf)

No. unionByName specifically tries to match columns in the two DataFrames by name and only appends values in columns with identical names across the two DataFrames. In the form presented

above, the command is a great fit for joining DataFrames that have exactly the same columns, but in a different order. In this case though, the command will fail because the two DataFrames have

different columns.

todayTransactionsDf.unionByName(yesterdayTransactionsDf, allowMissingColumns=True)

No. The unionByName command is described in the previous explanation. However, with the allowMissingColumns argument set to True, it is no longer an issue that the two DataFrames have

different column names. Any columns that do not have a match in the other DataFrame will be filled with null where there is no value. In the case at hand, the resulting DataFrame will have 7 or more

columns though, so it this command is not the right answer.

union(todayTransactionsDf, yesterdayTransactionsDf)

No, there is no union method in pyspark.sql.functions.

todayTransactionsDf.concat(yesterdayTransactionsDf)

Wrong, the DataFrame class does not have a concat method.

More info: pyspark.sql.DataFrame.union --- PySpark 3.1.2 documentation, pyspark.sql.DataFrame.unionByName --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 18 (Databricks import instructions)


Contribute your Thoughts:

Currently there are no comments in this discussion, be the first to comment!


Save Cancel