You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.
You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.
What should you include in the solution?
The Databricks ABS-AQS connector uses Azure Queue Storage (AQS) to provide an optimized file source that lets you find new files written to an Azure Blob storage (ABS) container without repeatedly listing all of the files.
This provides two major advantages:
Lower latency: no need to list nested directory structures on ABS, which is slow and resource intensive.
Lower costs: no more costly LIST API requests made to ABS.
https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/aqs
James Sandman
3 years agoSwapnil Pal
3 years ago