You are designing a storage solution for streaming data that is processed by Azure Databricks. The solution must meet the following requirements:
The data schema must be fluid.
The source data must have a high throughput.
The data must be available in multiple Azure regions as quickly as possible.
What should you include in the solution to meet the requirements?
Azure Cosmos DB is Microsoft's globally distributed, multi-model database. Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure's geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs).
You can read data from and write data to Azure Cosmos DB using Databricks.
Note on fluid schema:
If you are managing data whose structures are constantly changing at a high rate, particularly if transactions can come from external sources where it is difficult to enforce conformity across the database, you may want to consider a more schema-agnostic approach using a managed NoSQL database service like Azure Cosmos DB.
https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html
https://docs.microsoft.com/en-us/azure/cosmos-db/relational-nosql
Currently there are no comments in this discussion, be the first to comment!