Meet NeuBird at Human[X]. What’s next is taking shape.

Building Guardrails Against Hallucinations in AI SRE Agents

Hallucinations are the failure mode that keeps AI developers up at night. For practitioners who worked in ML before the LLM boom, non-determinism has always been part of the job expressed as error margins, confidence intervals, or false positive rates. What has changed is scale and visibility: LLM failures are now labeled as “hallucinations” and treated as exceptional, even though they are a natural outcome of probabilistic models.

The problem becomes more severe because modern applications rarely use a single LLM call. We are increasingly building agentic systems, composed of multiple LLM invocations with chaining, iteration, tool use, and self-reflection. In such systems, even a small upstream error, such as an incorrect timestamp or a malformed identifier, can propagate through subsequent steps and result in a completely incorrect final response.

This risk is particularly high in AI SRE agents. If an LLM incorrectly identifies the service name, error code, or resource to investigate, downstream reasoning can confidently converge on the wrong diagnosis. In operational systems, this makes hallucination mitigation a correctness and reliability concern, not just a quality issue.

Why hallucinations occur is outside the scope of this article, but in short they are influenced by several factors: the underlying model, token count, token composition, instruction ambiguity, decoding parameters such as temperature, and even variability in model-serving infrastructure. The important conclusion is that hallucinations cannot be fully eliminated at the model level.

That said, while individual LLM calls are non-deterministic, the overall system does not have to be. These are still software systems with probabilistic components, and established engineering practices apply.

1. Treat LLM systems as production software

Non-deterministic components do not remove the need for rigor. Write evaluation suites, unit tests, and end-to-end tests that explicitly measure system behavior under realistic failure modes. Testing remains the primary mechanism for reducing uncertainty.

The challenge with writing tests against agentic AI powered applications usually lies in the input space.  The input can vary significantly from the handful cases developers  may have tested the system with. On the other hand, for ML models, tests are done in the form of large datasets and measuring the performance of the model on that curated large dataset.  Though it is hard to curate a dataset that has all possible inputs to the system, it is important to start small and increase the diversity of the examples in your dataset.

2. Use structured and typed outputs

The more structured the LLM output, the easier it is to validate. Enforcing schemas, typed fields, and constrained formats significantly reduces silent failure and makes downstream checks deterministic. Many popular model providers allow api calls to be made with detailed structured output schemas. 

The key properties of structured outputs are:
Explicit schema: Required fields, allowed values, and data types are defined upfront.

  • Deterministic parsing: The response can be programmatically parsed without heuristics.
  • Immediate validation: Invalid or incomplete outputs can be rejected before downstream use.
  • Reduced hallucination surface: The model is constrained to fill known fields rather than invent narrative text.

In effect, structured outputs turn an LLM call into something closer to a typed function call than a text-generation task.

Example: Structured output in an API call

Below is an example using a schema-enforced response where the model must identify an SRE investigation target.


POST /v1/chat/completions
  {
    "model": "gpt-4.1",
    "messages": [
      {
        "role": "system",
        "content": "You are an AI SRE assistant. Extract investigation details from the incident."
      },
      {
        "role": "user",
        "content": "Service validation is failing with error code 503 in us-west1 since 10:42 UTC."
      }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "incident_investigation",
        "schema": {
          "type": "object",
          "required": ["service_name", "error_code", "region", "start_time"],
          "properties": {
            "service_name": {
              "type": "string",
              "description": "Canonical service identifier"
            },
            "error_code": {
              "type": "string",
              "enum": ["500", "502", "503", "504"]
            },
            "region": {
              "type": "string",
              "pattern": "^[a-z]+-[a-z]+[0-9]+$"
            },
            "start_time": {
              "type": "string",
              "format": "date-time"
            }
          }
        }
      }
    }
  }

 

A valid response from the model would look like:


{
    "service_name": "validation",
    "error_code": "503",
    "region": "us-west1",
    "start_time": "2026-01-22T10:42:00Z"
  }

 

If the model:

  • Omits a required field
  • Returns an invalid enum value
  • Produces malformed JSON

the response can be deterministically rejected and retried.

3. Using models to verify models

Although LLMs are non-deterministic, techniques such as consistency checks, cross-validation, and self-reflection have proven effective in practice. Redundancy can mitigate variance.

consistency checking: generate multiple independent responses (via re-sampling, prompt variation, or temperature sweeps) and compare them for agreement on critical fields. Disagreement is a strong signal of uncertainty and can be used to trigger retries, fallback logic, or human review.

Another effective technique is cross-validation using specialized prompts or models. For example, one model (or prompt) produces an answer, while another is tasked solely with verification—checking factual correctness, schema adherence, or alignment with known constraints. Importantly, the verifier’s scope should be narrower and more deterministic than the generator’s.

Self-reflection and critique loops further improve reliability. In this pattern, the model is explicitly asked to inspect its own output for errors, missing assumptions, or violations of constraints. While not foolproof, this often catches obvious inconsistencies, incorrect identifiers, and logical gaps before results propagate downstream.

4. Control context window size aggressively

Modern LLMs support extremely large context windows, which can lead to overconfidence in passing large volumes of data. Beyond a task and model-specific threshold, stuffing a lot of data into the context window can introduce additional non-determinism, increasing variance rather than accuracy, and raise hallucination risk.

The optimal token budget depends on the model and the task. The only reliable way to determine this limit is benchmarking: incrementally increase context size on a representative dataset and observe accuracy degradation.

Not all available information should be included. Minimizing irrelevant or weakly related context improves determinism and reduces the probability of spurious correlations.

Hallucination rate is one of the key accuracy metrics in our AI SRE evaluation framework.

Taming Non-Determinism in Agentic AI for Production Reliability

In summary, hallucinations are not anomalies; they are an expected property of probabilistic systems. The solution is not to hope for perfect models, but to apply disciplined systems engineering: testing, validation, structure, redundancy, and controlled inputs.

Using MCP With Cursor to Automate Incident Resolution

We announced in our previous blog that we are “meeting the developers in their backyard” with the launch of the Hawkeye MCP Server. In this blog, I want to show you a step-by-step demo of how to connect Hawkeye with Cursor IDE to dramatically improve the experience of responding to production issues.

Hawkeye is our AI SRE Agent that connects to endpoints using read-only API calls. This includes AWS, Azure, Grafana, GitHub, Datadog, and more. The agent autonomously investigates any alerts and incidents, then conducts a deep analysis of the connected metrics, events, logs, and traces. The result is accurate Root Cause Analysis (RCA) and remediation steps. 

The agent is available to access using MCP, and it gives engineers a single point of entry for reasoning with deep contextual insights from the entire infrastructure and application stack. Let’s take a look at how this works in practice using Cursor.

Connect Hawkeye with Cursor

Feel free to follow along with my steps by getting a free trial of Hawkeye here: https://signup.registration.neubird.ai/

Once Hawkeye is configured, take note of your username, password, and URL. Then, set them as environment variables in a terminal, replacing the respective strings with your own. 

export HAWKEYE_EMAIL=”your-email”
export HAWKEYE_PASSWORD=”your-pw”
export HAWKEYE_BASE_URL=”https://<env>.app.neubird.ai/api”

Launch Cursor from the same terminal so that it can inherit the variables. I will be showing the Cursor IDE example by running the ‘code’ command.

code

Note: If Cursor was open previously, you must fully exit the application and relaunch it from the command line to inherit the environment variables.

In the Cursor UI, go to Settings > Tools & MCP > Add Custom MCP (or + New MCP Server). Then, paste the following ‘hawkeye’ configuration to run a local MCP server.

{
  "mcpServers": {
    "hawkeye": {
      "command": "npx",
      "args": ["-y", "hawkeye-mcp-server@latest"],
      "env": {
        "HAWKEYE_EMAIL": "${env:HAWKEYE_EMAIL}",
        "HAWKEYE_PASSWORD": "${env:HAWKEYE_PASSWORD}",
        "HAWKEYE_BASE_URL": "${env:HAWKEYE_BASE_URL}"
      }
    }
  }
}


Note: Remote MCP server with token-based authentication is coming soon.

In the Cursor MCP configuration under Installed MCP Servers, you should see a green circle with the number of tools listed for Hawkeye, which indicates a successful connection.

Investigate and Remediate Alerts

The following queries are examples of how you can interact with Hawkeye using MCP to retrieve alerts from connected sources, run an investigation, and fix issues using Hawkeye’s RCA and remediation steps. 

  • List any uninvestigated alerts in Hawkeye and identify which one to address first

This prompt retrieves alerts from the chosen project in Hawkeye and references the metadata of the alerts to order them from highest to lowest priority. 

  • Run an investigation on the critical alert and retrieve the RCA

This prompt runs an investigation on Hawkeye, which produces the RCA. The remediation steps and recommendations are now loaded into Cursor’s context.

  • Refer to the remediation steps from Hawkeye and create a Pull Request (PR) in GitHub with suggested changes


Cursor uses LLMs to generate a code change based on the RCA provided by Hawkeye.

With just these three prompts, I am able to get a detailed root cause analysis, remediation steps to fix the issue, improved alerting mechanism to catch more details, and a first commit towards a PR.

Read more: the case against dashboard-native platforms

Next Steps

Now that you have access to our AI SRE Agent’s powerful context engineering across multiple environments, you can leverage it to automatically investigate issues such as:

  • What changed in the last 30 minutes across all environments that correlates with this latency spike?
  • Correlate this customer-reported issue with backend traces, logs, and infrastructure metrics for their request.
  • What’s the blast radius if this node/zone/region fails right now?

Once Hawkeye investigates the issue, you will get detailed remediation steps and recommendations that can be referenced by the LLM to create code and config changes. This brings you closer than ever to a fully automated lifecycle of incident creation to incident resolution, right in your development environment.

How Context Engineering Separates Enterprise AI from Toy AI SRE Agents

In the age of LLMs, it’s easy to build an AI agent that can do something. Wire up a prompt, call an API, and the demo looks great. It feels impressive – until you try to run it inside a real enterprise environment.

Because in enterprise environments, doing something isn’t enough. These environments are chaotic: telemetry is noisy, systems are fragmented, and stakes are high. In this world, most AI agents collapse under pressure.

Why? Because they lack one critical capability: contextual understanding. The difference between a toy agent and one that performs under pressure comes down to a single idea: context. 

And in production, context doesn’t come from a prompt. It comes from context engineering – a discipline most teams are only beginning to understand.

Most Agents Don’t Fail Because the Model Is Weak

They fail because the context is. LLMs are capable of advanced reasoning – but only when you feed them the right inputs. In enterprise IT, that’s no small feat. You’re dealing with:

  • Unstructured, deeply nested logs
  • High-volume, high-dimensional metrics
  • Distributed traces across async systems
  • Alerts based on thresholds, not causality
  • Constantly changing configs

Feed that into a model naively, and you don’t get insight. You get noise.

What Is Context Engineering?

Context engineering is the discipline of transforming raw data into structured, relevant, and task-specific input for an AI agent.

It’s about designing what the agent should see – and more importantly, what it shouldn’t. It’s about curating the right data, in the right format, at the right time.

That includes:

  • Pulling the right logs, not just the most recent ones
  • Extracting signal from high-volume metrics without flooding the model
  • Aligning traces, configs, and events into coherent timelines
  • Framing input to reflect how real engineers debug incidents

It’s surgical. And it’s what enables agents to reason instead of react.

Why This Matters for Enterprise AI Agents

Agents that operate in production environments need more than LLM wrappers and chatbot UIs.

They need the ability to trace causality across signals, awareness of architecture and deployment shifts, and contextual reasoning that mirrors how SREs think under pressure.

This requires more than a model. It requires a context engine—a system that filters, aligns, and sequences telemetry to make reasoning possible.

At NeuBird, this principle shaped how we built Hawkeye, our AI SRE agent. And it’s the foundation for what we believe all enterprise-grade agents will need to evolve: with context as the first-class input, not an afterthought.

Context Engineers: The Builders Behind the Curtain

This shift demands a new kind of builder: the context engineer.

Context engineers aren’t prompt tuners or ML ops specialists – they’re systems thinkers.
For enterprise IT operations, these would be people who’ve been on-call, traced issues across stacks, and know what real production debugging looks like. They build pipelines that:

  • Curate high-value context from raw telemetry
  • Normalize inputs across heterogeneous systems
  • Model the investigative paths that SREs actually follow
  • Translate infrastructure knowledge into structured, machine-readable form

They’re not just enabling the agent to respond. They’re teaching it to think.

Enterprise AI Will Be Built on Context

Everyone’s racing to build agents. But very few are building ones that truly understand the enterprise.

If your agent isn’t context-aware, it’s just reacting to symptoms.
If it can’t reason across telemetry, it’s not doing RCA.
If it doesn’t have engineered context, it’s not ready for production.

The future of enterprise AI will be shaped not by who uses the biggest model – but by who delivers the most relevant, structured, and actionable context to it. 

That’s context engineering. And that’s what makes the difference between a great demo and an enterprise grade agent.

Read more: Once context is engineered, reasoning graphs document the full decision chain, creating institutional knowledge from every investigation.

Transforming Confluent Operations with GenAI: How NeuBird’s Hawkeye Automates Incident Resolution in Confluent Cloud

A joint post from the teams at NeuBird and Confluent

For organizations running managed Confluent, the company behind Apache Kafka® as a central nervous system for their data, ensuring smooth operations is mission-critical. While Confluent Cloud eliminates much of the operational burden of managing Kafka clusters, application teams still need to monitor and troubleshoot client applications connecting to these clusters.

Traditionally, when issues arise—whether it’s unexpected consumer lag, authorization errors, or connectivity problems—engineers must manually piece together information from multiple observability tools, logs, and metrics to identify root causes. This process is time-consuming, requires specialized expertise, and often extends resolution times.

Today, we’re excited to share how NeuBird’s Hawkeye, a GenAI-powered SRE assistant, is transforming this experience by automating the investigation and resolution of Confluent Cloud incidents—allowing your team to focus on innovation rather than firefighting.

The Foundation: Kafka Client Observability with Confluent

Confluent’s observability setup provides a strong foundation for monitoring Kafka clients connected to Confluent Cloud. It leverages:

  • A time-series database (Prometheus) for metrics collection
  • Client metrics from Java consumers and producers
  • Visualization through Grafana dashboards
  • Failure scenarios to learn from and troubleshoot

The demo is incredibly valuable for understanding how to monitor Kafka clients and diagnose common issues, but it still relies on human expertise to interpret the data and determine root causes.

Enhancing the Experience with Kubernetes and AI-driven Automated Incident Response

NeuBird builds on Confluent’s robust observability foundation by integrating Hawkeye, our GenAI-powered SRE, directly into the Kafka monitoring ecosystem. This combination goes beyond monitoring to introduce intelligent, automated incident response, significantly reducing Mean Time to Resolution (MTTR).

Here’s how NeuBird augments Confluent’s observability with three significant improvements:

  1. Kubernetes Deployment: We’ve containerized the entire setup and made it deployable on Kubernetes (EKS), making it more representative of production environments and easier to deploy.
  2. Alert Manager Integration: We’ve added Prometheus Alert Manager rules that trigger PagerDuty incidents, creating a complete alerting pipeline.
  3. Audit Logging: We’ve expanded the telemetry scope to include both metrics and logs in CloudWatch, giving a more comprehensive view of the environment.

Most importantly, we’ve integrated Hawkeye—NeuBird’s GenAI-powered SRE—to automatically investigate and resolve incidents as they occur, significantly reducing Mean Time to Resolution (MTTR).

Seeing it in Action: Authorization Revocation Scenario

Let’s walk through a real-world scenario from the Confluent demo: the “Authorization Revoked” case, where a producer’s permission to write to a topic is unexpectedly revoked.

The Traditional Troubleshooting Workflow

In the original demo workflow, here’s what typically happens:

  1. An engineer receives an alert about producer errors
  2. They log into Grafana to check producer metrics
  3. They notice the Record error rate has increased
  4. They check Confluent Cloud metrics and see inbound traffic but no new retained bytes
  5. They examine producer logs and find TopicAuthorizationException errors
  6. They investigate ACLs and find the producer’s permissions were revoked
  7. They restore the correct ACLs to resolve the issue

This manual process might take 15-30 minutes for an experienced Kafka engineer, assuming they’re immediately available when the alert triggers.

The Hawkeye-Automated Workflow

With our enhanced setup including Hawkeye, the workflow is transformed:

  1. Prometheus Alert Manager detects increased error rates and triggers a PagerDuty incident
  2. Hawkeye automatically begins investigating the issue by:
    • Retrieving and analyzing producer metrics from Prometheus
    • Correlating with Confluent Cloud metrics
    • Examining producer logs for error patterns
    • Checking AWS CloudWatch for audit logs showing ACL changes
  3. Within minutes, Hawkeye identifies the TopicAuthorizationException and links it to recent ACL changes
  4. Hawkeye generates a detailed root cause analysis with specific remediation steps
  5. An engineer reviews Hawkeye’s findings and applies the recommended fix (or optionally, approves Hawkeye to implement the fix automatically)

The entire process is reduced to minutes, even when the issue occurs outside business hours. More importantly, your specialized Kafka engineers can focus on more strategic work rather than routine troubleshooting.

Demo Video

In this video, we demonstrate the complete workflow:

  1. How we deploy the enhanced Confluent observability solution to Kubernetes
  2. Triggering the authorization revocation scenario
  3. Watching Hawkeye automatically detect, investigate, and diagnose the issue
  4. Reviewing Hawkeye’s detailed analysis and remediation recommendations
  5. Implementing the fix and verifying the resolution

The Technical Architecture

Our enhanced solution builds upon Confluent’s observability foundation with several key components:

  • Kubernetes Deployment: All components are packaged as containers and deployed to EKS using Helm charts, making the setup reproducible and scalable.
  • Prometheus and Alert Manager: We’ve added custom alerting rules specifically designed for Confluent Cloud metrics and common failure patterns.
  • AWS CloudWatch Integration: Both metrics and logs are forwarded to CloudWatch, providing a centralized location for all telemetry data.
  • Hawkeye Integration: Hawkeye connects securely to your telemetry sources with read-only permissions, leveraging GenAI to understand patterns, correlate events, and recommend precise solutions.

The architecture respects all security best practices—Hawkeye never stores your telemetry data, operates with minimal permissions, and all analysis happens in ephemeral, isolated environments.

Real-World Impact

Organizations using Hawkeye with Confluent Cloud have seen significant operational improvements:

  • Reduced MTTR: Issues that previously took hours to diagnose are now resolved in minutes
  • Decreased Alert Fatigue: Engineers are only engaged when human intervention is truly needed
  • Knowledge Democratization: Teams less familiar with Kafka can confidently operate complex Confluent Cloud environments
  • Improved SLAs: With faster resolution times, application availability and performance metrics improve

As one example, an enterprise IT storage company reduced their MTTR for DevOps pipeline failures by implementing Hawkeye. When experiencing a crash loop with one of their applications causing production downtime, Hawkeye automatically picked up the alert from PagerDuty, investigated the issue, and determined that the crashes were happening due to a recent application deployment. Hawkeye recommended which specific application and process needed to be rolled back, dramatically reducing resolution time.

Getting Started

Want to try this enhanced observability setup with your own Confluent Cloud environment? Here’s how to get started:

  1. Start with the original Confluent observability demo to understand the components
  2. Check out our GitHub repository for the Kubernetes-ready version with Prometheus Alert Manager rules
  3. Schedule a demo to see Hawkeye in action with your Confluent Cloud environment

Conclusion

The combination of Confluent Cloud and Neubird’s Hawkeye represents a powerful shift in how organizations operate Kafka environments. By leveraging Confluent’s rich telemetry data and Hawkeye’s GenAI-powered automation, teams can significantly reduce operational overhead, improve reliability, and focus on delivering value rather than troubleshooting infrastructure.

As data streaming becomes increasingly central to modern applications and with availability of fully managed Kafka and Flink solutions in Confluent Cloud, this type of intelligent automation will be essential for scaling operations teams effectively—letting them support larger, more complex deployments without proportionally increasing headcount or sacrificing reliability.

We’re excited to continue innovating at the intersection of observability, AI, and data streaming. Let us know in the comments how you’re approaching observability for your Confluent Cloud environments!

DevOpsCon 2025: Where AI Moved From Hype to Hard Enterprise Problems

At DevOpsCon San Diego this year the energy was electric and the message was loud and clear: DevOps teams are navigating relentless operational complexity—and they’re looking for AI that actually works in their world. Not AI that lives in a demo, but intelligent automation that fits securely into hybrid environments, accelerates incident response, and helps engineers focus on what matters most.

Across sessions and conversations, the sentiment was strikingly consistent: teams don’t need more dashboards or alerts—they need fewer manual steps and faster root cause clarity.

AI Is Everywhere—But Pragmatism Is Back

AI agents and GenAI were everywhere at the conference, but the buzz was grounded in real-world need. Sessions underscored a shift in mindset: visibility is important—but insight and action are what actually move the needle.

DevOps professionals weren’t chasing the latest AI trend—they were seeking solutions to their most pressing operational challenges. The conversations I had at our booth consistently returned to one theme: how can AI help us work smarter, not harder?

On-Call Burnout Is Boiling Over

Incident response continues to drain DevOps teams. From late-night pings to hours spent tracing pipelines and logs, on-call has become more tedious and time-consuming—even as tooling has improved.

Teams are exhausted from stitching together fragmented telemetry. What they want is AI that understands their stack, integrates into existing systems, and helps get to the root cause faster—without adding another portal or platform to manage.

From Curiosity to Critical Path

Many teams shared past experiments with AI—mostly chatbots or copilots for ticketing or knowledge lookups. Useful, but shallow. Now, the question is different: “Can AI investigate incidents in our production environment without exposing our data?”

Security was a recurring theme. Multiple teams had tried sending telemetry into public LLMs and quickly rolled it back.

One CTO summed it up perfectly: “Dumping production logs into a public LLM isn’t innovation—it’s a liability.”

Sessions that explored successful AI implementation, like Justin Griffin’s real-world story of speeding up deployment investigations with an AI agent, sparked important discussions. During the Q&A, a recurring theme emerged from the audience: teams desperately want AI that can connect the dots between different failure points without requiring them to manually correlate data across multiple tools. As the session demonstrated, the value comes from combining reasoning with context—and doing it securely.

The Security-First AI Revolution

What struck me most about DevOpsCon 2025 was how security considerations are driving better AI adoption, not hindering it. Organizations have learned from early missteps and are now demanding enterprise-grade solutions.

Teams shared cautionary tales of experimenting with general-purpose LLMs—from hallucinated recommendations that caused production outages to security breaches from exposing sensitive telemetry data. The lesson is clear: enterprise operations require purpose-built AI agents, not retrofitted consumer tools.

The Path Forward: Secure, Embedded, Purpose-Built AI

DevOps teams aren’t looking for bolt-on bots or generic copilots. They’re demanding intelligent agents that can integrate deeply with their observability and CI/CD systems, run securely in hybrid environments, and reason through telemetry rather than just summarize it.

That’s why interest in Hawkeye surged at our booth. Teams saw how it can operate in enterprise environments– cloud-native, on-prem and in hybrid cloud- using chain-of-thought workflows to surface root causes from real telemetry—without ever exposing sensitive data outside of their control.

DevOps Isn’t Getting Simpler—But Your Workflow Can

DevOpsCon 2025 made one thing clear: tool fatigue is real, alert overload is unsustainable, and AI has a critical role to play in restoring signal, trust, and speed.

Engineers aren’t asking AI to replace them. They’re asking for AI that thinks like an expert, works with them, and reduces the operational noise.

If that’s what your team is ready for, let’s connect. 👉 Book a demo to see how Hawkeye helps reduce MTTR, eliminate redundant work, and bring calm back to your on-call.

Enhancing Contextual Intelligence in AI Agents with MCP

In my previous article, I explored the delicate balance between speed, quality, and cost in AI agent design. Today, I want to dive deeper into how we’re enhancing our agentic AI SRE, Hawkeye, through the Model Context Protocol (MCP) – and why it’s a cornerstone for scalable, intelligent agentic workflows in enterprise environments.

The Enterprise Telemetry Challenge

As my co-founder Gou Rao recently noted, “In the world of Site Reliability Engineering (SRE) and IT operations, problems rarely come with clean, structured answers.” Enterprise IT teams have access to a wide range of telemetry through observability platforms, incident management tools, and internal dashboards. And in some cases, SREs still end up manually combing through logs to piece the puzzle together.

But the core challenge isn’t just access to data – It’s connecting relevant context in a way that makes the data actionable. A CPU spike means little without the surrounding environment: recent deployments,  config changes, or past anomalies.

Why Contextual Knowledge Is Essential

For an AI Agent to act autonomously—like a seasoned SRE—it must reason through complexity, not just surface patterns. That means asking follow-up questions, testing hypotheses, and adapting based on what it finds. This type of reasoning demands more than data ingestion. It requires contextual bridges—connections across systems that provide a unified operational understanding.

Enter the Model Context Protocol (MCP)

MCP connects AI agents to enterprise systems in a structured, dynamic way. MCP enables Hawkeye to navigate environments intelligently—pulling only what’s relevant, when it matters.

When an SRE asks, “Why are users experiencing delays when trying to log in — is the authentication service slower than usual?”, Hawkeye draws information from its existing connections to your tech stack, as well as from your MCP resources and tools:

  • CI/CD pipelines to retrieve deployment history 
  • Source control systems like Git to track and identify changes.
  • Docs, architectural diagrams, runbooks, and other sources of  tribal knowledge
  • Historical incidents that match current patterns

These connections span monitoring tools, code repositories, ticketing platforms, and internal wikis—creating contextual bridges that break down silos. Hawkeye synthesizes inputs from each source to build a coherent, real-time understanding of the issue.

From there, it activates its dynamic runbooks—or Hawkeye’s “chain of thought”—to move from symptom to root cause to remediation. This isn’t just access to data. It’s contextual reasoning in motion.

Practical Implementation

We’ve designed Hawkeye’s MCP integration with real-world production environments in mind:

  • Runtime flexibility: New connections can be added dynamically
  • Security-aware design: Scoped permissions protect boundaries
  • Cross-system correlation: Structured context allows pattern recognition across tools

Together, these capabilities support iterative, self-reflective reasoning—enabling Hawkeye to pursue hypotheses, revisit assumptions, and adapt its course like a human SRE would.

The Road Ahead for Agentic Systems

As enterprise environments grow more complex, the contextual awareness that MCP enables won’t just be useful—it will be essential. With rich environmental intelligence at the core, we’re advancing toward more autonomous and effective problem-solving.

This shift redefines what agents can do—elevating them from narrow, task-based tools to systems that reason across silos and act with precision.

At NeuBird, our mission is to build agents that think and adapt like real engineers. With context as their compass, we’re bringing that vision to life—and redefining what agentic AI can deliver for enterprise IT.

 

Building Trust in AI Operations: Hawkeye’s Approach to Transparency

 

 

In the rapidly evolving landscape of IT operations, artificial intelligence has emerged as a powerful force for managing complex systems. However, with this power comes a critical challenge: building and maintaining trust. At Neubird, we recognize that trust isn’t just about powerful technology—it’s about transparency, accountability, and consistent results. Let’s explore how Hawkeye’s approach to transparent AI operations is setting new standards in the industry.

For a full breakdown of how Hawkeye works, check out our deep dive blog.

The Trust Challenge in AI Operations

Traditional IT operations rely on human-readable logs, clear audit trails, and well-documented processes. When introducing AI into this environment, maintaining this transparency becomes both more crucial and more challenging. Engineers need to understand not just what actions were taken, but why they were chosen and how decisions were made.

Hawkeye’s Pillars of Transparent Operations

  • Explainable Decision Making

At every step of an investigation, Hawkeye maintains clear documentation of its reasoning process. Unlike black-box AI systems that simply provide conclusions, Hawkeye shows its work:

– Detailed investigation plans based on historical patterns

– Clear documentation of data sources consulted

– Step-by-step reasoning for conclusions drawn

– Evidence-based recommendations with supporting data

  • Comprehensive Audit Trails

In IT operations, accountability is non-negotiable. Hawkeye maintains detailed audit trails that track:

– Every investigation step taken

– Data sources accessed and queries executed

– Decision points and their rationale

– Recommended actions and their expected outcomes

These audit trails serve multiple purposes: they provide accountability, enable learning from past incidents, and help teams understand how Hawkeye adapts its approach over time.

  • Human Oversight and Control

While Hawkeye is powerful, it’s designed to augment human expertise, not replace it. Key aspects of this approach include:

– Customer-controlled access policies that can be revoked instantly

– Read-only operations by default, ensuring system safety

– Clear presentation of evidence for human validation

– Ability to adjust investigation parameters based on human input

The Role of Architecture in Trust

Hawkeye’s commitment to transparency isn’t just about features—it’s embedded in its architecture:

 Secure by Design

– Zero data storage policy ensures privacy

– Ephemeral processing protects sensitive information

– Read-only access prevents unauthorized changes

Verifiable Processing

The system’s telemetry program generation creates a clear chain of evidence:

– Programs are generated using controlled, fine-tuned LLMs

– Processing occurs in isolated memory spaces

– Results are consistently formatted and verifiable

– All data handling is traceable and auditable

Real-World Impact: From Trust to Value

The transparency built into Hawkeye creates a virtuous cycle:

  1. Clear evidence builds confidence in AI-driven decisions
  2. Understanding leads to better collaboration between AI and engineers
  3. Traceable outcomes enable continuous improvement
  4. Trust enables broader adoption and more valuable automation

 The Future of Transparent AI Operations

As AI continues to transform IT operations, transparency will become even more critical. Hawkeye’s approach demonstrates how AI can be both powerful and trustworthy, setting a new standard for the industry.

Traditional IT workflows are time-consuming and involve constant context switching. Engineers spend hours manually investigating alerts and correlating events before taking action.

  • Traditional SRE Workflow:
  1. Alert fires
  2. Check CloudWatch
  3. Open ServiceNow
  4. Investigate logs
  5. Correlate events
  6. Document findings
  7. Take action

Time spent: Hours
🔄 Context switches: 15+

With Hawkeye, this workflow is transformed into an AI-driven process that reduces manual effort while maintaining transparency and accountability.

  • Modern SRE Workflow with GenAI:
  1. AI correlates data
  2. Reviews root cause
  3. Implements solution

Time spent: Minutes
🔄 Context switches: 1

By using generative AI, Hawkeye reduces operational noise, streamlines investigations, and allows teams to focus on higher-level strategic tasks instead of repetitive workflows.

Ream more: Power-up your AWS CloudWatch and ServiceNow SRE workflows

Conclusion

In today’s business environment, trust isn’t optional, it’s essential. Hawkeye’s commitment to transparency, from its architecture to its outputs, ensures that teams can confidently embrace AI-driven operations while maintaining the accountability their organizations require.

The future of IT operations will be defined not just by what AI can do, but by how well it can be understood and trusted. Through its innovative approach to transparency, Hawkeye is helping shape that future today, enabling teams to build reliable, scalable, and trustworthy AI-powered operations.

To see Hawkeye in action and understand how it can elevate your IT operations, book a demo today and experience the future of trustworthy AI firsthand.

Unleashing Diagnostic Pack Intelligence With GenAI

Diagnostic packages are treasure troves of critical system insights—often trapped behind hours of manual analysis. Hawkeye liberates this valuable data, transforming tedious log investigations into rapid, precise problem-solving. What if you could turn complex diagnostic packages into actionable AI-powered intelligence in minutes?

The Hidden Goldmine in Your IT Operations

Every IT operations team knows the scenario: Your monitoring dashboards are green, but something still isn’t right. The real story often lies buried in diagnostic packages – packed with stack traces, system configs, and those detailed performance metrics – that teams have traditionally had to analyze manually. Until now, this valuable data has remained isolated from modern observability workflows, creating blind spots in incident investigation and resolution.

Bringing GenAI Intelligence to Diagnostic Packs

Support engineers and SREs, here’s to better days ahead! Hawkeye now applies its GenAI-powered intelligence to diagnostic packages, transforming tedious manual analysis into rapid, automated insights. Now you can finally say goodbye to hours of log parsing and hello to quick, precise problem resolution. This capability automates one of your most time-consuming tasks so you can focus on what matters most – both at work and beyond.

With this new capability, Hawkeye now:

  • Takes those dreaded diagnostic packages off your plate – let GenAI do the heavy lifting
  • Makes sense of the chaos by connecting the dots between your ticket context and diagnostic data
  • Delivers answers you can trust, backed by comprehensive analysis across all your data sources

See It In Action: Diagnostic Pack Issue Decoded 

Here’s a common support scenario: A “users can’t submit orders” ticket arrives with a massive diagnostic package attached. Our very own Grant Griffith demonstrates how this typical investigation transforms from a potential day-ruiner into a quick win. In the video below, watch how Hawkeye turns what used to be hours of log-diving into minutes of precise analysis. No more lost evenings, no more context-switching headaches – just precise, actionable insights from your diagnostic data.

Watch how Hawkeye:

  • Quickly identifies the interaction between two services – orders and billing
  • Pinpoints the root cause: a billing service memory error triggered by excessive retries 
  • Generates a comprehensive RCA document in seconds, complete with a detailed report of the incident and technical recommendations to prevent such issues in the future.

The entire process – from uploading the diagnostic package to having a complete RCA ready for the ticket – takes just minutes, transforming what could have been hours of log analysis.

Welcome to Better Days in IT Operations

IT teams know the scenario all too well – poring over massive diagnostic packages, knowing the answer is in there somewhere. Hawkeye turns those moments of frustration into quick wins. With their GenAI teammate onboard, ITOps teams gain instant insights from every investigation, resolving incidents faster and focusing their expertise on strategic initiatives.

Book a demo  to learn how Hawkeye can transform those diagnostic package challenges into opportunities to shine.

Transforming VDI Management and Monitoring with GenAI, ControlUp and Hawkeye Integration

How VDI Teams Are Shaking Up Their VDI Management & Operations with AI-Powered Analysis

Monday morning hits, and your VDI team is staring down dozens of ControlUp alerts about crummy user experience scores. Admins start digging into metrics, comparing performance data across virtual desktop sessions, trying to figure out if the root cause is the host, the network, or something else entirely.

Meanwhile, users are complaining about sluggish performance, and productivity takes a nosedive.
This happens all the time as VDI teams try to keep desktop performance solid while juggling increasingly complex virtual setups. ControlUp gives you deep visibility, sure, but as environments grow, the sheer amount of data can swamp anyone.

The VDI Monitoring Headache

Today’s virtual desktop environments are trickier than ever, supporting huge numbers of remote users who need a consistently good desktop experience. VDI monitoring tools like ControlUp capture this complexity with detailed metrics across many layers:

  • User experience scores
  • Application performance metrics
  • Resource utilization statistics
  • Network latency measurements
  • Host and hypervisor metrics
  • Login times and session data

While ControlUp is great at collecting and showing this data, VDI teams often find themselves bouncing between views, manually connecting metrics, and spending valuable time just trying to piece together the story behind performance hiccups.

The problem isn’t a lack of data. It’s making sense of it all when there’s so much.

The Limitations of Traditional VDI Monitoring

Even with good VDI monitoring solutions, organizations still hit roadblocks:

  • Too many disconnected alerts make it hard to know what’s important.
  • Performance data is stuck in silos, needing manual correlation.
  • Issues often only get attention after they affect users.
  • VDI admins burn hours investigating tricky problems.
  • Know-how about specific environment quirks stays stuck in individual admins’ heads.

These limits just get worse as VDI environments grow. When you’re supporting thousands of virtual desktops across different locations, even seasoned admins can get buried in monitoring data.

Meet Hawkeye: Your GenAI-Powered ControlUp VDI Monitoring Analyst

Think about a different way to handle VDI operations. Instead of people trying to swim through this data flood, Hawkeye acts like a smart agent that understands the tangled relationships in your virtual desktop environment. By connecting directly with ControlUp, Hawkeye changes how teams monitor, analyze, and tune their VDI infrastructure.

Hawkeye doesn’t replace your VDI monitoring tools. It makes them way more valuable by applying AI to the data they already gather.

Beyond Just VDI Management

When looking into a VDI incident, Hawkeye does more than just basic metric checks:

  • It understands the relationships between hosts, sessions, and applications
  • It correlates performance metrics across different infrastructure layers.
  • It recognizes patterns in user behavior and resource use.
  • It figures out how infrastructure changes affect user experience.
  • It spots chances for optimization before users feel the pain.
  • It learns from every investigation, building a deep understanding of your specific VDI setup.

This analysis happens in seconds, not the minutes or hours it would take an admin to pull together and process the same info.

The New VDI Performance Management Workflow

Old-school VDI monitoring means admins have to:

  • Watch multiple ControlUp dashboards.
  • Switch between different metric views constantly.
  • Manually correlate performance data.
  • Document findings and what they did.
  • Hunt down related changes and configs.

With Hawkeye boosting your VDI workflow, admins start with a single view of the problem and all the relevant info needed to fix it. Routine issues come with clear, actionable advice, while complex problems get detailed investigation summaries that already include data from across your VDI environment.

Check out the video below or read more on how to transform VDI login performance with AI.

From Reactive to Proactive VDI Management

Pairing Hawkeye with ControlUp tackles core operational headaches. Finding experienced VDI administrators is tough and costly, and organizations are always under pressure to keep desktop performance up while managing costs.

Hawkeye changes the game by:

  • Automating routine investigations and providing intelligent analysis.
  • Identifying potential issues before they impact user experience.
  • Suggesting proactive optimizations based on usage patterns.
  • Building a knowledge base of environment-specific insights.
  • Helping new team members get up to speed faster.

The Path Forward: Smarter VDI Optimization

As Hawkeye learns your VDI environment, it moves beyond just fixing problems to proactively optimizing things by:

  • Predicting potential performance degradation before users are affected
  • Recommending resource allocation adjustments based on usage patterns
  • Suggesting configuration improvements for optimal user experience
  • Identifying opportunities for infrastructure optimization
  • Providing trend analysis for capacity planning

Getting Started

Adding Hawkeye alongside ControlUp is simple. The range of our AI SRE integrations mean you can connect it to your entire observability stack, creating a unified intelligence layer across all your tools.

  1. Connect Hawkeye to your ControlUp environment
  2. Configure access to relevant metrics and logs
  3. Begin receiving AI-powered insights and recommendations
  4. Watch as Hawkeye learns and adapts to your specific environment

Read more: 

 

Take the Next Step

Ready to improve how you manage your virtual desktop infrastructure? Check our demo or contact us to learn how Hawkeye can become your team’s AI-powered VDI analyst and help your organization handle the complexity of modern virtual desktop environments.

 

FAQ

How is VDI different from VM?

A Virtual Machine (VM) is a single virtual computer – it has its own OS, CPU, memory, the works, all running separately on a bigger server. VDI (Virtual Desktop Infrastructure) is the whole system that uses a bunch of those VMs specifically to give users their own virtual desktops over the network. So, a VM is the basic building block; VDI is the setup using those blocks to deliver desktops, managed centrally.

Is a VDI a VPN?

Nope, they’re different tools for different jobs. VDI gives you a complete virtual desktop that lives on a server somewhere else; you just access it remotely. All your data and apps stay on that central server. A VPN, on the other hand, just creates a secure connection (like a private tunnel) from your computer back to your company’s network, letting you access internal stuff as if you were in the office, but using your own device’s OS and apps.

What is the difference between VDI and Citrix?

VDI is the general concept or technology for delivering virtual desktops. Lots of companies offer VDI solutions. Citrix is one of those companies – they make specific products (like Citrix Virtual Apps and Desktops) that are VDI solutions, often adding their own special features for things like app virtualization or making the connection feel smoother.

Is VDI the same as remote desktop?

Not quite. Remote Desktop (like Microsoft’s RDP) often lets you connect to a computer (physical or virtual) that might be shared by multiple people at the same time. Resources get shared, and there’s less isolation. VDI typically gives each user their own dedicated virtual desktop running inside its own VM, usually hosted in a data center. This means it’s more isolated, you can often customize it more, and it’s built for managing lots of users securely.

What is a ControlUp?

ControlUp monitoring tool is used by IT teams to manage virtual desktop environments (VDI), physical desktops, and servers. It gives them a real-time look at performance stuff like how users are experiencing their sessions, how resources are being used, and network lag. It uses agents and monitors to gather this data, feeding it into dashboards and alerts so admins can troubleshoot problems, like why someone’s login is taking forever. Learn more.

What is a ControlUp monitor?

A ControlUp monitor is a component of the ControlUp platform. Its job is to constantly collect performance data in real-time from your VDI, physical desktops, or servers – things like user experience scores or resource usage. It usually runs 24/7 on a dedicated machine and sends all that info back to the main ControlUp dashboards and alerting system so admins can keep an eye on system health. It gives you lots of visibility, but seeing the whole picture from all that data still takes manual work and correlation, unlike AI-driven tools like Hawkeye that automate insights.

What is the ControlUp agent used for?

The ControlUp agent is a small piece of software you install on the actual endpoints – the virtual desktops, physical machines, or servers being monitored. It gathers detailed performance info right from the source (like CPU usage, how apps are running, network latency, etc.) and sends it back to the ControlUp console or monitor. This lets admins troubleshoot problems in real-time, figure out issues like slow logins, and even perform remote actions on the machine.

The Silent Treatment: Diagnosing VPN Interface Black Holes

How SRE teams are transforming VPN troubleshooting with AI

It’s 3 AM, and your monitoring system lights up with alerts about application connectivity issues. The initial investigation shows that traffic is flowing to your VPN interface, but seemingly vanishing into thin air before reaching its destination. Sound familiar? For network engineers and SRE teams, this “black hole” scenario is both common and frustratingly complex to diagnose.

The VPN Black Hole Challenge

Consider this recent scenario: A large e-commerce platform suddenly experienced order processing delays. Their payment service, running in AWS, couldn’t reach the payment processor’s API through a site-to-site VPN. Traffic appeared normal leaving the AWS environment, but never arrived at the destination. The monitoring dashboards showed green – the VPN tunnel was up, routes were in place, and security groups were correctly configured.

Yet the problem persisted. The traditional approach meant multiple teams manually checking:

  • VPN tunnel status and metrics
  • Route table configurations
  • Security group and NACL rules
  • BGP session states
  • MTU settings across the path
  • IPSec phase 1 and 2 configurations
  • Dead peer detection (DPD) timeouts

Each team had their own monitoring tools, none of which could correlate data across the entire path. Hours passed before someone noticed that a recent security patch had modified the IPSec transform set on one side of the tunnel, creating a mismatch that dropped packets silently.

Beyond Traditional Monitoring

The challenge isn’t lack of monitoring – it’s that traditional tools can’t connect the dots across complex network paths. Each dashboard shows its piece of the puzzle, but assembling the complete picture requires extensive manual correlation and deep networking expertise.

This is where AI-powered investigation transforms the game. When this same company encountered a similar issue two months later, Hawkeye immediately:

  • Correlated VPN metrics from both endpoints
  • Detected the asymmetric traffic pattern
  • Identified configuration drift between tunnel endpoints
  • Pinpointed the exact parameter mismatch
  • Provided a clear remediation plan

What previously took hours of manual investigation across multiple teams was resolved in minutes.

The Power of Context-Aware Analysis

Hawkeye’s approach goes beyond simple metric monitoring. By understanding the relationships between network components, it can:

  • Track configuration changes across both ends of VPN tunnels
  • Correlate routing updates with traffic patterns
  • Monitor encryption parameters for mismatches
  • Detect subtle patterns in packet loss and latency
  • Identify asymmetric routing issues

More importantly, Hawkeye learns from each investigation, building a knowledge base of VPN failure patterns specific to your environment. This means faster resolution times and often, prevention of issues before they impact services.

From Reactive to Proactive

For network teams, this transformation means:

  • Fewer middle-of-night emergencies
  • Reduced mean time to resolution (MTTR)
  • Automated correlation of networking data
  • Early warning of potential VPN issues
  • More time for strategic network planning

Getting Started

Ready to transform your VPN troubleshooting? Hawkeye integrates with your existing network monitoring tools, including CloudWatch, Azure Monitor, and traditional NMS platforms. By connecting these data sources, you create a unified view of your network infrastructure with intelligent, AI-powered analysis.

Contact us to learn how Hawkeye can become your team’s AI-powered networking expert and help prevent VPN black holes from disrupting your services

# # # # # #