A retail company uses an Amazon Redshift data warehouse and an Amazon S3 bucket. The company ingests retail order data into the S3 bucket every day.
The company stores all order data at a single path within the S3 bucket. The data has more than 100 columns. The company ingests the order data from a third-party application that generates more than 30 files in CSV format every day. Each CSV file is between 50 and 70 MB in size.
The company uses Amazon Redshift Spectrum to run queries that select sets of columns. Users aggregate metrics based on daily orders. Recently, users have reported that the performance of the queries has degraded. A data engineer must resolve the performance issues for the queries.
Which combination of steps will meet this requirement with LEAST developmental effort? (Select TWO.)
The performance issue in Amazon Redshift Spectrum queries arises due to the nature of CSV files, which are row-based storage formats. Spectrum is more optimized for columnar formats, which significantly improve performance by reducing the amount of data scanned. Also, partitioning data based on relevant columns like order date can further reduce the amount of data scanned, as queries can focus only on the necessary partitions.
A . Configure the third-party application to create the files in a columnar format:
Columnar formats (like Parquet or ORC) store data in a way that is optimized for analytical queries because they allow queries to scan only the columns required, rather than scanning all columns in a row-based format like CSV.
Amazon Redshift Spectrum works much more efficiently with columnar formats, reducing the amount of data that needs to be scanned, which improves query performance.
C . Partition the order data in the S3 bucket based on order date:
Partitioning the data on columns like order date allows Redshift Spectrum to skip scanning unnecessary partitions, leading to improved query performance.
By organizing data into partitions, you minimize the number of files Spectrum has to read, further optimizing performance.
Alternatives Considered:
B (Develop an AWS Glue ETL job): While consolidating files can improve performance by reducing the number of small files (which can be inefficient to process), it adds additional ETL complexity. Switching to a columnar format (Option A) and partitioning (Option C) provides more significant performance improvements with less development effort.
D and E (JSON-related options): Using JSON format or the SUPER type in Redshift introduces complexity and isn't as efficient as the proposed solutions, especially since JSON is not a columnar format.
A company uses Amazon S3 to store data and Amazon QuickSight to create visualizations.
The company has an S3 bucket in an AWS account named Hub-Account. The S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. The company's QuickSight instance is in a separate account named BI-Account
The company updates the S3 bucket policy to grant access to the QuickSight service role. The company wants to enable cross-account access to allow QuickSight to interact with the S3 bucket.
Which combination of steps will meet this requirement? (Select TWO.)
Problem Analysis:
The company needs cross-account access to allow QuickSight in BI-Account to interact with an S3 bucket in Hub-Account.
The bucket is encrypted with an AWS KMS key.
Appropriate permissions must be set for both S3 access and KMS decryption.
Key Considerations:
QuickSight requires IAM permissions to access S3 data and decrypt files using the KMS key.
Both S3 and KMS permissions need to be properly configured across accounts.
Solution Analysis:
Option A: Use Existing KMS Key for Encryption
While the existing KMS key is used for encryption, it must also grant decryption permissions to QuickSight.
Option B: Add S3 Bucket to QuickSight Role
Granting S3 bucket access to the QuickSight service role is necessary for cross-account access.
Option C: AWS RAM for Bucket Sharing
AWS RAM is not required; bucket policies and IAM roles suffice for granting cross-account access.
Option D: IAM Policy for KMS Access
QuickSight's service role in BI-Account needs explicit permissions to use the KMS key for decryption.
Option E: Add KMS Key as Resource for Role
The KMS key must explicitly list the QuickSight role as an entity that can access it.
Implementation Steps:
S3 Bucket Policy in Hub-Account: Add a policy to the S3 bucket granting the QuickSight service role access:
json
{
'Version': '2012-10-17',
'Statement': [
{
'Effect': 'Allow',
'Principal': { 'AWS': 'arn:aws:iam::<BI-Account-ID>:role/service-role/QuickSightRole' },
'Action': 's3:GetObject',
'Resource': 'arn:aws:s3:::<Bucket-Name>/*'
}
]
}
KMS Key Policy in Hub-Account: Add permissions for the QuickSight role:
{
'Version': '2012-10-17',
'Statement': [
{
'Effect': 'Allow',
'Principal': { 'AWS': 'arn:aws:iam::<BI-Account-ID>:role/service-role/QuickSightRole' },
'Action': [
'kms:Decrypt',
'kms:DescribeKey'
],
'Resource': '*'
}
]
}
IAM Policy for QuickSight Role in BI-Account: Attach the following policy to the QuickSight service role:
{
'Version': '2012-10-17',
'Statement': [
{
'Effect': 'Allow',
'Action': [
's3:GetObject',
'kms:Decrypt'
],
'Resource': [
'arn:aws:s3:::<Bucket-Name>/*',
'arn:aws:kms:<region>:<Hub-Account-ID>:key/<KMS-Key-ID>'
]
}
]
}
A data engineer is building an automated extract, transform, and load (ETL) ingestion pipeline by using AWS Glue. The pipeline ingests compressed files that are in an Amazon S3 bucket. The ingestion pipeline must support incremental data processing.
Which AWS Glue feature should the data engineer use to meet this requirement?
Problem Analysis:
The pipeline processes compressed files in S3 and must support incremental data processing.
AWS Glue features must facilitate tracking progress to avoid reprocessing the same data.
Key Considerations:
Incremental data processing requires tracking which files or partitions have already been processed.
The solution must be automated and efficient for large-scale ETL jobs.
Solution Analysis:
Option A: Workflows
Workflows organize and orchestrate multiple Glue jobs but do not track progress for incremental data processing.
Option B: Triggers
Triggers initiate Glue jobs based on a schedule or events but do not track which data has been processed.
Option C: Job Bookmarks
Job bookmarks track the state of the data that has been processed, enabling incremental processing.
Automatically skip files or partitions that were previously processed in Glue jobs.
Option D: Classifiers
Classifiers determine the schema of incoming data but do not handle incremental processing.
Final Recommendation:
Job bookmarks are specifically designed to enable incremental data processing in AWS Glue ETL pipelines.
A company has a gaming application that stores data in Amazon DynamoDB tables. A data engineer needs to ingest the game data into an Amazon OpenSearch Service cluster. Data updates must occur in near real time.
Which solution will meet these requirements?
Problem Analysis:
The company uses DynamoDB for gaming data storage and needs to ingest data into Amazon OpenSearch Service in near real time.
Data updates must propagate quickly to OpenSearch for analytics or search purposes.
Key Considerations:
DynamoDB Streams provide near-real-time capture of table changes (inserts, updates, and deletes).
Integration with AWS Lambda allows seamless processing of these changes.
OpenSearch offers APIs for indexing and updating documents, which Lambda can invoke.
Solution Analysis:
Option A: Step Functions with Periodic Export
Not suitable for near-real-time updates; introduces significant latency.
Operationally complex to manage periodic exports and S3 data ingestion.
Option B: AWS Glue Job
AWS Glue is designed for ETL workloads but lacks real-time processing capabilities.
Option C: DynamoDB Streams + Lambda
DynamoDB Streams capture changes in near real time.
Lambda can process these streams and use the OpenSearch API to update the index.
This approach provides low latency and seamless integration with minimal operational overhead.
Option D: Custom OpenSearch Plugin
Writing a custom plugin adds complexity and is unnecessary with existing AWS integrations.
Implementation Steps:
Enable DynamoDB Streams for the relevant DynamoDB tables.
Create a Lambda function to process stream records:
Parse insert, update, and delete events.
Use OpenSearch APIs to index or update documents based on the event type.
Set up a trigger to invoke the Lambda function whenever there are changes in the DynamoDB Stream.
Monitor and log errors for debugging and operational health.
Amazon DynamoDB Streams Documentation
A mobile gaming company wants to capture data from its gaming app. The company wants to make the data available to three internal consumers of the data. The data records are approximately 20 KB in size.
The company wants to achieve optimal throughput from each device that runs the gaming app. Additionally, the company wants to develop an application to process data streams. The stream-processing application must have dedicated throughput for each internal consumer.
Which solution will meet these requirements?
Problem Analysis:
Input Requirements: Gaming app generates approximately 20 KB data records, which must be ingested and made available to three internal consumers with dedicated throughput.
Key Requirements:
High throughput for ingestion from each device.
Dedicated processing bandwidth for each consumer.
Key Considerations:
Amazon Kinesis Data Streams supports high-throughput ingestion with PutRecords API for batch writes.
The Enhanced Fan-Out feature provides dedicated throughput to each consumer, avoiding bandwidth contention.
This solution avoids bottlenecks and ensures optimal throughput for the gaming application and consumers.
Solution Analysis:
Option A: Kinesis Data Streams + Enhanced Fan-Out
PutRecords API is designed for batch writes, improving ingestion performance.
Enhanced Fan-Out allows each consumer to process the stream independently with dedicated throughput.
Option B: Data Firehose + Dedicated Throughput Request
Firehose is not designed for real-time stream processing or fan-out. It delivers data to destinations like S3, Redshift, or OpenSearch, not multiple independent consumers.
Option C: Data Firehose + Enhanced Fan-Out
Firehose does not support enhanced fan-out. This option is invalid.
Option D: Kinesis Data Streams + EC2 Instances
Hosting stream-processing applications on EC2 increases operational overhead compared to native enhanced fan-out.
Final Recommendation:
Use Kinesis Data Streams with Enhanced Fan-Out for high-throughput ingestion and dedicated consumer bandwidth.
Johnetta
3 days agoFletcher
11 days agoAndra
25 days agoKaitlyn
1 months agoCecilia
1 months agoMarquetta
2 months agoWade
2 months agoGlory
2 months agoTatum
3 months agoMelodie
3 months agoVicki
3 months agoGaston
3 months agoPedro
4 months agoTanesha
4 months agoFredric
4 months agoGlenn
4 months agoEliseo
4 months agoShawna
5 months agoEloisa
5 months agoDaron
5 months agoLashonda
5 months agoEdgar
5 months agoRessie
6 months agoIlene
6 months agoKarina
6 months ago