Google Professional Data Engineer Exam - Topic 4 Question 44 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 44
Topic #: 4

[All Professional Data Engineer Questions]

You are designing a cloud-native historical data processing system to meet the following conditions:

The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.

A streaming data pipeline stores new data daily.

Peformance is not a factor in the solution.

The solution design should maximize availability.

How should you design data storage for this solution?

ACreate a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.

BStore the data in BigQuery. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.

CStore the data in a regional Cloud Storage bucket. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.

DStore the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.

Show Suggested Answer

Suggested Answer: D

by Fatima at May 08, 2022, 11:02 AM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Johnetta

4 months ago

Multi-regional storage? That’s a bit overkill, isn’t it?

upvoted 0 times

...

Una

4 months ago

Wait, why not just use HDFS? Seems outdated.

upvoted 0 times

...

Dion

4 months ago

Storing in BigQuery (Option B) could be more efficient!

upvoted 0 times

...

Tashia

5 months ago

I think D is better for redundancy.

upvoted 0 times

...

Nguyet

5 months ago

Option C seems solid for availability.

upvoted 0 times

...

Dominga

5 months ago

I feel like a multi-regional Cloud Storage bucket could be overkill for this scenario, but it does sound like it would ensure high availability.

upvoted 0 times

...

Kandis

5 months ago

I practiced a similar question where we had to maximize availability, and I think using a regional Cloud Storage bucket might be a good balance, but I’m not completely confident.

upvoted 0 times

...

Carmelina

5 months ago

I think storing the data in BigQuery could simplify access for analysis tools, but I’m a bit uncertain about how it handles CSV and PDF formats directly.

upvoted 0 times

...

Reuben

5 months ago

I remember we discussed the importance of using Cloud Storage for different data formats, but I'm not sure if multi-regional is necessary since performance isn't a factor.

upvoted 0 times

...

Carlton

5 months ago

I'm pretty sure the answer is D. Discussing alternative plans and gauging reactions doesn't seem like it would be part of the needs analysis stage.

upvoted 0 times

...

Janine

5 months ago

I think VRRP advertisements are sent only from the master router, but I can't remember if the standby routers send them too.

upvoted 0 times

...

Okay, let me see. The question is asking about determining the breakeven point, so I think the key is finding the technique that looks at the costs and projected income over time. Cost-benefit analysis seems relevant, but discounted cash flow feels like the more precise answer here.

upvoted 0 times

...

Raul

5 months ago

I think asynchronous collaboration means working at different times, so online meetings probably aren't it. I'm leaning towards wikis and shared workspaces.

upvoted 0 times

...

Google Professional Data Engineer Exam - Topic 4 Question 44 Discussion

Contribute your Thoughts:

Johnetta

Una

Dion

Tashia

Nguyet

Dominga

Kandis

Carmelina

Reuben

Carlton

Janine

Evan

Raul