A Generative AI Engineer is developing an agent system using a popular agent-authoring library. The agent comprises multiple parallel and sequential chains. The engineer encounters challenges as the agent fails at one of the steps, making it difficult to debug the root cause. They need to find an appropriate approach to research this issue and discover the cause of failure. Which approach do they choose?
For complex agentic systems (like those built with LangGraph or Autogen), standard logging is often insufficient because the 'state' of the agent changes dynamically. MLflow Tracing is the designated Generative AI engineering standard for debugging these systems. Tracing provides a visual, hierarchical timeline of every call made during an agent's execution---including internal LLM reasoning, tool calls, and data transformations. When a step fails, the trace allows the engineer to click into that specific node to see the exact input sent to the LLM and the raw output received. This is much faster and more comprehensive than manually deconstructing the agent (D) or adding manual logs (C). While mlflow.evaluate (B) is useful for measuring performance across a whole dataset, it is not a tool for real-time debugging of a single execution failure.
All of the following are Python APIs used to query Databricks foundation models. When running in an interactive notebook, which of the following libraries does not automatically use the current session credentials?
When working within a Databricks notebook, several high-level SDKs are 'Databricks-aware.' The MLflow Deployments SDK (C) and the Databricks Python SDK (D) are designed to automatically look for the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables provided by the notebook context. The OpenAI client (A), when configured for Databricks via Mosaic AI Gateway, also typically handles authentication via workspace integration in recent versions. However, the REST API via the requests library (B) is a generic Python HTTP client. It has no intrinsic knowledge of the Databricks environment. To use it, an engineer must manually extract the token (e.g., via dbutils.notebook.entry_point...) and explicitly pass it in the Authorization: Bearer <token> header of the request. Without this manual step, the requests library will fail with a 401 Unauthorized error.
A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.
Which will fulfill their need?
When prioritizing cost and latency over quality in a Large Language Model (LLM)-based application, it is crucial to select a configuration that minimizes both computational resources and latency while still providing reasonable performance. Here's why D is the best choice:
Context length: The context length of 512 tokens aligns with the chunk size used for the documents (maximum of 512 tokens per chunk). This is sufficient for capturing the needed information and generating responses without unnecessary overhead.
Smallest model size: The model with a size of 0.13GB is significantly smaller than the other options. This small footprint ensures faster inference times and lower memory usage, which directly reduces both latency and cost.
Embedding dimension: While the embedding dimension of 384 is smaller than the other options, it is still adequate for tasks where cost and speed are more important than precision and depth of understanding.
This setup achieves the desired balance between cost-efficiency and reasonable performance in a latency-sensitive, cost-conscious application.
A Generative Al Engineer is developing a RAG application and would like to experiment with different embedding models to improve the application performance.
Which strategy for picking an embedding model should they choose?
The task involves improving a Retrieval-Augmented Generation (RAG) application's performance by experimenting with embedding models. The choice of embedding model impacts retrieval accuracy, which is critical for RAG systems. Let's evaluate the options based on Databricks Generative AI Engineer best practices.
Option A: Pick an embedding model trained on related domain knowledge
Embedding models trained on domain-specific data (e.g., industry-specific corpora) produce vectors that better capture the semantics of the application's context, improving retrieval relevance. For RAG, this is a key strategy to enhance performance.
Databricks Reference: 'For optimal retrieval in RAG systems, select embedding models aligned with the domain of your data' ('Building LLM Applications with Databricks,' 2023).
Option B: Pick the most recent and most performant open LLM released at the time
LLMs are not embedding models; they generate text, not embeddings for retrieval. While recent LLMs may be performant for generation, this doesn't address the embedding step in RAG. This option misunderstands the component being selected.
Databricks Reference: Embedding models and LLMs are distinct in RAG workflows: 'Embedding models convert text to vectors, while LLMs generate responses' ('Generative AI Cookbook').
Option C: Pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace
The MTEB leaderboard ranks models across general tasks, but high overall performance doesn't guarantee suitability for a specific domain. A top-ranked model might excel in generic contexts but underperform on the engineer's unique data.
Databricks Reference: General performance is less critical than domain fit: 'Benchmark rankings provide a starting point, but domain-specific evaluation is recommended' ('Databricks Generative AI Engineer Guide').
Option D: Pick an embedding model with multilingual support to support potential multilingual user questions
Multilingual support is useful only if the application explicitly requires it. Without evidence of multilingual needs, this adds complexity without guaranteed performance gains for the current use case.
Databricks Reference: 'Choose features like multilingual support based on application requirements' ('Building LLM-Powered Applications').
Conclusion: Option A is the best strategy because it prioritizes domain relevance, directly improving retrieval accuracy in a RAG system---aligning with Databricks' emphasis on tailoring models to specific use cases.
A Generative Al Engineer needs to design an LLM pipeline to conduct multi-stage reasoning that leverages external tools. To be effective at this, the LLM will need to plan and adapt actions while performing complex reasoning tasks.
Which approach will do this?
The task requires an LLM pipeline for multi-stage reasoning with external tools, necessitating planning, adaptability, and complex reasoning. Let's evaluate the options based on Databricks' recommendations for advanced LLM workflows.
Option A: Train the LLM to generate a single, comprehensive response without interacting with any external tools, relying solely on its pre-trained knowledge
This approach limits the LLM to its static knowledge base, excluding external tools and multi-stage reasoning. It can't adapt or plan actions dynamically, failing the requirements.
Databricks Reference: 'External tools enhance LLM capabilities beyond pre-trained knowledge' ('Building LLM Applications with Databricks,' 2023).
Option B: Implement a framework like ReAct which allows the LLM to generate reasoning traces and perform task-specific actions that leverage external tools if necessary
ReAct (Reasoning + Acting) combines reasoning traces (step-by-step logic) with actions (e.g., tool calls), enabling the LLM to plan, adapt, and execute complex tasks iteratively. This meets all requirements: multi-stage reasoning, tool use, and adaptability.
Databricks Reference: 'Frameworks like ReAct enable LLMs to interleave reasoning and external tool interactions for complex problem-solving' ('Generative AI Cookbook,' 2023).
Option C: Encourage the LLM to make multiple API calls in sequence without planning or structuring the calls, allowing the LLM to decide when and how to use external tools spontaneously
Unstructured, spontaneous API calls lack planning and may lead to inefficient or incorrect tool usage. This doesn't ensure effective multi-stage reasoning or adaptability.
Databricks Reference: Structured frameworks are preferred: 'Ad-hoc tool calls can reduce reliability in complex tasks' ('Building LLM-Powered Applications').
Option D: Use a Chain-of-Thought (CoT) prompting technique to guide the LLM through a series of reasoning steps, then manually input the results from external tools for the final answer
CoT improves reasoning but relies on manual tool interaction, breaking automation and adaptability. It's not a scalable pipeline solution.
Databricks Reference: 'Manual intervention is impractical for production LLM pipelines' ('Databricks Generative AI Engineer Guide').
Conclusion: Option B (ReAct) is the best approach, as it integrates reasoning and tool use in a structured, adaptive framework, aligning with Databricks' guidance for complex LLM workflows.
William Williams
11 days agoLaura Lee
7 days agoRonald Wilson
8 days agoDonna Baker
8 days agoFabiola
29 days agoCecily
1 month agoKrystal
1 month agoLynelle
2 months agoMarylou
2 months agoAngella
2 months agoWillard
3 months agoGilberto
3 months agoInes
3 months agoJames
3 months agoGilbert
4 months agoColette
4 months agoTegan
4 months agoSylvia
4 months agoHubert
5 months agoCarlene
5 months agoTayna
5 months agoTitus
5 months agoGwenn
6 months agoKatie
6 months agoDaryl
6 months agoMalcolm
6 months agoMarlon
7 months agoWilbert
7 months agoKattie
7 months agoBritt
7 months agoNaomi
8 months agoLore
8 months agoPaul
10 months agoElinore
11 months agoBobbie
12 months agoShannon
1 year agoAhmad
1 year agoJoni
1 year agoEmogene
1 year agoElke
1 year agoToshia
1 year agoMatthew
1 year agoMari
1 year agoDeangelo
1 year agoVirgilio
1 year agoDewitt
2 years agoDesmond
2 years agoMy
2 years agoSherrell
2 years agoMila
2 years agoCarri
2 years agoAntonette
2 years agoOcie
2 years ago