[LangGenius Internal Case] Self-Serve Production Queries from the IDE: LangGenius’s In-House Ops Smart Assistant

Introduction

A quick note on naming. The “Ops” in Ops Smart Assistant is short for Operations — the people who own production infrastructure: servers, monitoring, logs, Kubernetes. For the rest of this article we’ll write “Ops engineers” for the role, and keep “Ops Smart Assistant” in English as the product name.

When something goes wrong in production, every minute costs something. But the engineers actually writing the code usually don’t have query access to Grafana, Sentry, or Kubernetes, and don’t know PromQL. So the same script plays every time: open a ticket → page someone in the Ops channel → wait. Meanwhile, Ops engineers spend most of their day answering the same handful of questions, and their real infrastructure work gets pushed further down the queue.

We hit this wall at LangGenius too. So we built an internal “Ops Smart Assistant” on Dify. Users ask in plain language; the assistant routes the question to the right backend tool and returns a dashboard screenshot together with an AI-generated explanation, delivered right inside whatever environment the developer is already in (Cursor / VS Code / Web).

This write-up is a record of what we actually ran with — the usage patterns and design trade-offs — as a real, still-live LangGenius internal use case.

Background and Problem

The “waiting tax” is real

Role	What hurts
Developer	Checking a simple service metric turns into opening tickets and switching between multiple dashboards; half a day gone.
Ops engineer	Bandwidth is consumed by dozens of repeat questions a day, and real infra work keeps getting diluted.
The team as a whole	The first-minute response to an incident is gated by two walls: access permissions and dashboard familiarity.

Why classic ChatOps wasn’t enough

Classic ChatOps bots are typically command-driven — things like /metrics billing cpu 24h. For developers who only touch production occasionally, memorizing those commands is itself a barrier. More critically, what comes back is usually a raw chart or JSON blob with no interpretation, so the developer still ends up pinging someone.

What we wanted was a frontend that understands natural language, can route, and can explain. Whatever the developer would type in Slack to ask a human, they should be able to type here and get an answer.

Why Dify Fits This Shape

A few things about Dify line up naturally with this kind of assistant:

Natural-language frontend + tool orchestration backend. Developers face a single natural-language entry point; routing, data fetching, and synthesis all happen behind it.
One permission boundary. Tool calls live on the Dify side, so backend credentials never leak to the client. Easier to audit, easier to shrink the blast radius.
One backend, many surfaces. The same Dify app powers a Cursor plugin, a VS Code extension, and a web page — no need to reimplement logic per surface.
Composable outputs. We can return “dashboard screenshot + AI interpretation” as a single answer, rather than handing the user a chart and making them do the second hop themselves.

What It Looks Like to a User

Ask in plain language. Type “How’s CPU on the billing service over the last 24 hours?” right inside the IDE. No commands, no window switching.
The system routes automatically. The assistant recognizes this as a metrics question and calls the metrics toolchain, not the logs toolchain.
Answer comes back as a screenshot plus an explanation. A Grafana dashboard snapshot, alongside a short AI-generated summary of what it shows. Seconds, not minutes.
Delivered where the developer already is. No forced tool switch. Cursor, VS Code, and Web all hit the same backend.

From the developer’s side, the shift is simple: “go check production” went from a thing you schedule to a thing you do in passing.

Outcomes

Dimension	Before	After
Time per query	Hours (waiting for someone + switching dashboards)	~30 seconds
Repeat questions to Ops	Dozens per day	Down sharply; bandwidth goes back to higher-leverage work
Developer experience	Leave the IDE, chase multiple tools, wait for people	Stay in the IDE, self-serve

None of this is magical, and Dify isn’t the only tool that could do it. What we actually saved is the organization-wide coordination cost of “small daily questions” — a cost that’s invisible day to day, but once it’s gone you don’t want it back.

What We’ve Learned from Running It

Here’s what feels worth writing down:

Start read-only. Get queries right before considering any write operations (restart a Pod, change a config). Permission boundaries are much harder to tighten than to loosen.
Enforce permissions at the orchestration layer, not in the prompt. Don’t rely on “the model won’t call the dangerous tool”; don’t expose the dangerous tool to the user surface in the first place.
Give “I don’t know” an explicit escape hatch. Better for the assistant to say “I’m not confident on this — please reach out to oncall” than to hallucinate a plausible-looking answer.
Don’t chase full dashboard coverage up front. Cover the top 3–5 most frequent question types and make them rock-solid. Extend the long tail incrementally.
Meet developers where they already are. Cursor plugin / VS Code extension / Web — put the entry point wherever they’re already working. If you make them learn a new tool, they probably won’t use it.

Takeaway

The Ops Smart Assistant isn’t fundamentally a new tool. It’s LangGenius stitching three things together with Dify — natural-language entry, tool orchestration, and result synthesis — so that “go check production” stops needing cross-team coordination internally. What it removes is the invisible cost sitting between developers and Ops. Queries first, closed-loop later; developers first, the organization-level wins come after. That’s the line we’d most want to pass on from running this ourselves.

Keyboard shortcuts

MKC — Dify Japan Content System