Tiago Silva

Mostly about IT, sometimes about really interesting stuff

Observability vs Monitoring

There’s a lot of talk about observability these days, and it’s easy to confuse it with monitoring. But the difference really matters — especially as systems get more complex.

Monitoring is about tracking known things. You define metrics and thresholds, set up alerts, and wait to be told when something breaks. It’s reactive. You already know the kinds of problems you’re looking for, and you build tools to catch them.

Observability, on the other hand, is about answering questions you didn’t know you’d need to ask. It’s about understanding how your systems behave, diagnosing the unexpected, and making smarter decisions based on real data — not assumptions. It’s a proactive and exploratory approach to understand the internal state of a system by examining its external outputs.

In practice, observability is about putting the right structures in place to see and understand what your systems are doing — all the time, not just when things go wrong. You start by identifying which metrics actually matter for your business and your users — like response times, error rates, or system throughput. Then, you instrument your code to capture useful logs, traces, and metrics. Tools like OpenTelemetry can help make that process more consistent.

From there, you build dashboards that highlight what’s important and set alerts that trigger on real issues, not noise. Popular tools like Prometheus, Grafana, and the ELK stack make this possible, and platforms like Datadog or New Relic can bring everything into one place.

But tools alone aren’t enough. Observability has to be part of how the team works. That means using data in retros, reviewing patterns after incidents, and making decisions based on what’s actually happening in your systems — not just what you hope is happening.

When observability is done right, your team detects and solves problems faster, your systems run more reliably, and decisions get made with more confidence. You spend less time guessing and more time improving. And instead of reacting to issues, you start anticipating them — and building better systems because of it.

Observability isn’t about collecting more data or spinning up endless dashboards. It’s about clarity. It’s about helping your team ask better questions, spot issues early, and stay aligned with what really matters — both technically and to the business.


Technical Debt

Technical debt is inevitable, but manageable. When left unchecked, it doesn’t just affect your codebase, it affects your people, delivery, and business.

Technical debt gradually erodes team productivity and slows down development cycles, making it harder to ship features or iterate quickly. As the codebase grows, the likelihood of bugs and defects increases—undermining product quality and user trust. Delivery timelines stretch, and the time-to-market for new features suffers.

Beyond the technical realm, debt can lead to stakeholder frustration, especially when delays or instability affect customer experience. It raises maintenance costs and diverts resources from innovation, reducing your ability to adopt new technologies or scale effectively. Over time, this misalignment between business goals and technical reality introduces risk—whether through security vulnerabilities, platform limitations, or strategic inflexibility.

Addressing technical debt proactively is essential to maintain agility, reduce operational drag, and keep the focus on building value.

There are different types of Technical Debt:

- Dependencies: Outdated or hard-to-maintain tools/libraries.

- Patterns: Poor design choices that cause recurring issues.

- Redundancies: Duplicated logic or fragmented systems.

- Abstract Elements: Unclear goals or shifting requirements.

- Legacy Templates: Inefficient scaffolding holding teams back.

- Concept Debt: Building unused or unnecessary features.

And different strategies to tackle and prevent it:

- Automate and update dependencies.

- Prioritize refactoring and enforce design reviews.

- Audit and consolidate duplicated components.

- Align abstract ideas with concrete business value.

- Modernize outdated templates and practices.

- Validate feature ideas early—only build what matters.

- Use agile workflows to identify issues early.

- Invest in code quality (reviews, pair programming, static analysis).

- Keep teams trained and current.

- Foster cross-team alignment.

- Design for change—anticipate growth.

- Schedule time to refactor and clean up.

Technical debt isn’t just a tech issue. It’s a business issue. Managing it well is a competitive advantage.


Lessons from Working with LLMs

I’ve been actively exploring large language models (LLMs) and chatbots, particularly since the release of DeepSeek. Working with them both in the cloud and locally, I’ve applied them to various scenarios—from software development and finance to mentoring, data analysis, and even travel planning.

Recently, I was analyzing a very large dataset and ran into a roadblock: the file size was simply too large for the LLM to process effectively. I tried several approaches—cleaning, transforming, compressing, and even splitting the data into smaller chunks. In essence, I was adapting my problem to fit the tool.

Then, in one of my iterations, something unexpected happened. The LLM itself suggested that perhaps I was using the wrong tool for the job. And it was right. I was so focused on making the problem work within the constraints of an LLM that I overlooked more suitable solutions.

It was a classic case of “when all you have is a hammer, everything looks like a nail.” This experience was a great reminder that while LLMs are incredibly powerful, they are not a one-size-fits-all solution. Choosing the right tool for the task is just as important as understanding the problem itself.


Conway’s Law

"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure."

This principle is more than just an observation; it’s a strategic insight. The way teams are structured and communicate within an organization profoundly impacts the systems, products, and services they deliver.

Team organization is an ongoing process that must evolve with the business. Revisiting and refining team topologies is essential to maintaining alignment and achieving success.