I had a conversation last week about monitoring tools.
The guy I was talking to was defending one of the big platforms. Expensive. Feature-heavy. The kind that sends you a 40-page report when something breaks.
His argument: these tools help you find the root cause faster.
I pushed back.
Monitoring CPU and memory is dead.
I know that sounds extreme. Let me explain.
CPU spikes. Memory leaks. Disk I/O. These are symptoms.
Latency is the truth.
If your p95 response time is 800ms, your users are already leaving. I don’t care what your CPU is doing. I care that your API is slow and I want to know exactly where it’s slow and why.
Everything else is noise.
But here’s where it gets interesting.
His argument assumed something I can’t accept:
That the goal of monitoring is to find bad code faster.
Think about that for a second.
We’ve built an entire industry around tools that help developers locate the bottlenecks they created. And now we’re adding AI on top to fix those bottlenecks automatically.
So the workflow is:
- Write slow code
- Pay $2,000/month for a tool to find it
- Pay again for AI to fix it
That’s not engineering. That’s technical debt with a subscription model.
The root cause isn’t the missing tool.
The root cause is that nobody asked “why is this slow?” before shipping it.
Expensive observability platforms don’t make your code better. They make it more comfortable to ship code that isn’t.
The best monitoring strategy is writing code that doesn’t need to be monitored.
And when you do monitor — track latency. p50, p95, p99. That’s the only number your users actually feel.
Everything else is a dashboard that makes you feel busy while your users quietly leave.
I have seen services running at 100% CPU working perfectly fine. And services with idle CPUs that were completely unusable. Once again — trust latency, not the noise.
If you’re nerdy enough, drop a comment and I’ll send you the full technical breakdown of how I actually diagnose a latency spike in production.
- What metrics does your team actually act on when something breaks?
If your infrastructure costs more than it should, let’s talk: https://lnkd.in/dBZ8xjEa