BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Topic 1 Question 19 Discussion

Actual exam question for Databricks's Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam
Question #: 19
Topic #: 1
[All Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions]

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?

Sample of itemsDf:

1. +------+-----------------------------+-------------------+

2. |itemId|attributes |supplier |

3. +------+-----------------------------+-------------------+

4. |1 |[blue, winter, cozy] |Sports Company Inc.|

5. |2 |[red, summer, fresh, cooling]|YetiX |

6. |3 |[green, summer, travel] |Sports Company Inc.|

7. +------+-----------------------------+-------------------+

Show Suggested Answer Hide Answer
Suggested Answer: D

The challenge in this Question: comes from there being an array variable in the schema. In addition, you should know how to pass a schema to the DataFrameReader that is invoked by

spark.read.

The correct way to define an array of strings in a schema is through ArrayType(StringType()). A schema can be passed to the DataFrameReader by simply appending schema(structType) to the

read() operator. Alternatively, you can also define a schema as a string. For example, for the schema of itemsDf, the following string would make sense: itemId integer, attributes array<string>,

supplier string.

A thing to keep in mind is that in schema definitions, you always need to instantiate the types, like so: StringType(). Just using StringType does not work in pySpark and will fail.

Another concern with schemas is whether columns should be nullable, so allowed to have null values. In the case at hand, this is not a concern however, since the Question: just asks for a

'valid'

schema. Both non-nullable and nullable column schemas would be valid here, since no null value appears in the DataFrame sample.

More info: Learning Spark, 2nd Edition, Chapter 3

Static notebook | Dynamic notebook: See test 3, Question: 19 (Databricks import instructions)


Contribute your Thoughts:

Currently there are no comments in this discussion, be the first to comment!


Save Cancel