BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Microsoft Exam DP-203 Topic 8 Question 37 Discussion

Actual exam question for Microsoft's DP-203 exam
Question #: 37
Topic #: 8
[All DP-203 Questions]

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.

You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.

What should you include in the solution?

Show Suggested Answer Hide Answer
Suggested Answer: B

The Databricks ABS-AQS connector uses Azure Queue Storage (AQS) to provide an optimized file source that lets you find new files written to an Azure Blob storage (ABS) container without repeatedly listing all of the files.

This provides two major advantages:

Lower latency: no need to list nested directory structures on ABS, which is slow and resource intensive.

Lower costs: no more costly LIST API requests made to ABS.


https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/aqs

Contribute your Thoughts:

James Sandman
3 years ago
I think the answer is also A, but for different justification: 1) Microsoft article states that, "When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed databases. (https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition) 2) When calculating the number of partitions another article states, "Having too many partitions can reduce the effectiveness of clustered columnstore indexes if each partition has fewer than 1 million rows. Dedicated SQL pools automatically partition your data into 60 databases. So, if you create a table with 100 partitions, the result will be 6000 partitions." (https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool) Thus, if the equation to determine optimal compression and performance of the clustered columnstore index is [2.4 billion / (60*partition range)] >= 1,000,000; then the answer is 40. Anything else results in a number less than 1,000,000
upvoted 2 times
...
Swapnil Pal
3 years ago
answer is A.... 2.4 B / 40 = 60 M.....which is most optimum
upvoted 1 times
...

Save Cancel