Microsoft Exam DP-201 Topic 4 Question 27 Discussion

Actual exam question for Microsoft's DP-201 exam

Question #: 27
Topic #: 4

You manage a process that performs analysis of daily web traffic logs on an HDInsight cluster. Each of the 250 web servers generates approximately 10megabytes (MB) of log data each day. All log data is stored in a single folder in Microsoft Azure Data Lake Storage Gen 2.

You need to improve the performance of the process.

Which two changes should you make? Each correct answer presents a complete solution.

NOTE: Each correct selection is worth one point.

ACombine the daily log files for all servers into one file

BIncrease the value of the mapreduce map memory parameter

CMove the log files into folders so that each day's logs are in their own folder

DIncrease the number of worker nodes

EIncrease the value of the hive.tez.containerize parameter

Show Suggested Answer

Suggested Answer: A, C

A: Typically, analytics engines such as HDInsight and Azure Data Lake Analytics has a per-five overhead. If you store your data as many small files, this can negatively affect performance. In general, organize your data into larger sized files for better performance (256MB to 100GB in size). Some engines and applications might have trouble efficiently processing files that are greater than 100GB in size.

C: For Hive workloads, partition pruning of time-series data can help some queries read only a subset of the data which improves performance.

Those pipelines that ingest time-series data, often place their files with a very structured naming for files and folders. Below is a very common example we see for data is structured by date:

DataSetYYYYMMDDdatafile_YYYY_MM_DD.tsv

Notice that the datetime information appears both as folders and in the filename.

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-performance-tuning-guidance

by Wilda at May 04, 2022, 03:56 AM

Limited Time Offer

25%

Off

Get Premium DP-201 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!