Your team is building several data pipelines that contain a collection of complex tasks and dependencies that you want to execute on a schedule, in a specific order. The tasks and dependencies consist of files in Cloud Storage, Apache Spark jobs, and data in BigQuery. You need to design a system that can schedule and automate these data processing tasks using a fully managed approach. What should you do?
Using Cloud Composer to create Directed Acyclic Graphs (DAGs) is the best solution because it is a fully managed, scalable workflow orchestration service based on Apache Airflow. Cloud Composer allows you to define complex task dependencies and schedules while integrating seamlessly with Google Cloud services such as Cloud Storage, BigQuery, and Dataproc for Apache Spark jobs. This approach minimizes operational overhead, supports scheduling and automation, and provides an efficient and fully managed way to orchestrate your data pipelines.
You are constructing a data pipeline to process sensitive customer data stored in a Cloud Storage bucket. You need to ensure that this data remains accessible, even in the event of a single-zone outage. What should you do?
Storing the data in a multi-region bucket ensures high availability and durability, even in the event of a single-zone outage. Multi-region buckets replicate data across multiple locations within the selected region, providing resilience against zone-level failures and ensuring that the data remains accessible. This approach is particularly suitable for sensitive customer data that must remain available without interruptions.
Your retail company collects customer data from various sources:
You are designing a data pipeline to extract this dat
a. Which Google Cloud storage system(s) should you select for further analysis and ML model training?
Online transactions: Storing the transactional data in BigQuery is ideal because BigQuery is a serverless data warehouse optimized for querying and analyzing structured data at scale. It supports SQL queries and is suitable for structured transactional data.
Customer feedback: Storing customer feedback in Cloud Storage is appropriate as it allows you to store unstructured text files reliably and at a low cost. Cloud Storage also integrates well with data processing and ML tools for further analysis.
Social media activity: Storing real-time social media activity in BigQuery is optimal because BigQuery supports streaming inserts, enabling real-time ingestion and analysis of data. This allows immediate analysis and integration into dashboards or ML pipelines.
Your company uses Looker as its primary business intelligence platform. You want to use LookML to visualize the profit margin for each of your company's products in your Looker Explores and dashboards. You need to implement a solution quickly and efficiently. What should you do?
Defining a new measure in LookML to calculate the profit margin using the existing revenue and cost fields is the most efficient and straightforward solution. This approach allows you to dynamically compute the profit margin directly within your Looker Explores and dashboards without needing to pre-calculate or create additional tables. The measure can be defined using LookML syntax, such as:
measure: profit_margin {
type: number
sql: (revenue - cost) / revenue ;;
value_format: '0.0%'
}
This method is quick to implement and integrates seamlessly into your existing Looker model, enabling accurate visualization of profit margins across your products.
You are a data analyst working with sensitive customer data in BigQuery. You need to ensure that only authorized personnel within your organization can query this data, while following the principle of least privilege. What should you do?
Using IAM roles to enable access control in BigQuery is the best approach to ensure that only authorized personnel can query the sensitive customer data. IAM allows you to define granular permissions at the project, dataset, or table level, ensuring that users have only the access they need in accordance with the principle of least privilege. For example, you can assign roles like roles/bigquery.dataViewer to allow read-only access or roles/bigquery.dataEditor for more advanced permissions. This approach provides centralized and manageable access control, which is critical for protecting sensitive data.
Shaquana
8 days agoSocorro
9 days agoPauline
10 days ago