Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 2 Question 53 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam

Question #: 53
Topic #: 2

[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

A1. select
2. 'storeId'
3. print_schema()

B1. limit
2. 1
3. columns

C1. select
2. 'storeId'
3. printSchema()

D1. limit
2. 'storeId'
3. printSchema()

E1. select
2. storeId
3. dtypes

Show Suggested Answer

Suggested Answer: B

Correct code block:

transactionsDf.select('storeId').printSchema()

The difficulty of this Question: is that it is hard to solve with the stepwise first-to-last-gap approach that has worked well for similar questions, since the answer options are so different from

one

another. Instead, you might want to eliminate answers by looking for patterns of frequently wrong answers.

A first pattern that you may recognize by now is that column names are not expressed in quotes. For this reason, the answer that includes storeId should be eliminated.

By now, you may have understood that the DataFrame.limit() is useful for returning a specified amount of rows. It has nothing to do with specific columns. For this reason, the answer that resolves to

limit('storeId') can be eliminated.

Given that we are interested in information about the data type, you should Question: whether the answer that resolves to limit(1).columns provides you with this information. While

DataFrame.columns is a valid call, it will only report back column names, but not column types. So, you can eliminate this option.

The two remaining options either use the printSchema() or print_schema() command. You may remember that DataFrame.printSchema() is the only valid command of the two. The select('storeId')

part just returns the storeId column of transactionsDf - this works here, since we are only interested in that column's type anyways.

More info: pyspark.sql.DataFrame.printSchema --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 57 (Databricks import instructions)

by Lacey at Apr 22, 2024, 12:06 PM

Limited Time Offer

25%

7 months ago

I think the answer is option C.

upvoted 0 times

...