Tech Leaderism

Everything as Code

I recently analyzed Kasava's strategy of managing their entire organization, including backend services, frontend applications, legal documentation and investor materials, within a single repository. This approach extends the traditional monorepo into what I would call a "context monorepo." By co-locating marketing configurations with actual business logic, they enable their AI agents to cross-reference pricing claims on a public website against the enforcement code in the backend, ensuring zero synchronization drift.

From a technical standpoint, the implementation prioritizes pragmatic isolation over standard monorepo tooling. This philosophy extends to their documentation strategy with the inclusion of CLAUDE.md files. This signals a subtle but significant shift where repository architecture is optimized not just for human maintainers but to serve as an effective context window for AI agents.

While this architecture assumes a high-trust environment that may be difficult to maintain at enterprise scale, it exposes a flaw in how we currently separate concerns. We often sequester documentation and specifications in silos like Confluence or Notion, effectively blinding our AI tools. Kasava's model suggests that maximizing AI leverage requires breaking down the barriers between code and content, even if it means rethinking standard permission boundaries.

Read the full engineering deep dive here.

Software Carbon Intensity (SCI)

Measuring the carbon impact of software requires moving from abstract estimates to precise metrics. The Software Carbon Intensity (SCI) specification provides a methodology to calculate this as a rate, which is essential for understanding the environmental cost of specific technical actions, such as an API call or a user session.

To implement this calculation, it is necessary to define the specific units involved in the formula:

SCI = ((E × I) + M) / R

The components are defined as follows:

The resulting SCI Score is expressed as gCO2e per Functional Unit.

By standardizing these units, IT professionals can establish a baseline for their systems. This allows for a granular view of how architectural changes, such as optimizing a database query or migrating to a more efficient instance type, directly reduce the carbon emitted per unit of work.

The full technical details are available via the Green Software Foundation: https://sci.greensoftware.foundation/

Books in 2025

Looking back at some of the books I read in 2025, I noticed a distinct theme. While technical skills are the baseline for working in IT, the ability to understand systems, people and strategy is what truly drives a career forward.

Revisiting "The Phoenix Project" alongside "Team Topologies" was a powerful exercise. It reinforced the idea that software architecture is inextricably linked to organizational architecture. You cannot optimize your deployment pipeline if you do not optimize your communication structures. Understanding cognitive load and team flow is just as critical as the code itself.

With the rapid evolution of AI, reading Harari’s "Nexus" and Doctorow’s "Enshittification" felt necessary. These books serve as a reminder that we need to be vigilant about the long-term impacts of the platforms we create and maintain.

"The Lean Startup" and "The First 90 Days" provided a solid grounding in agility. Whether you are launching a new feature or stepping into a new role, the ability to validate assumptions quickly and deliver value early is the core of modern IT strategy.

Perhaps the most useful book for my day-to-day work was "Meditations" by Marcus Aurelius. In an industry often defined by urgent incidents and rapid change, Stoicism offers a practical framework for resilience.

To be effective in IT, we have to look beyond the screen. We need to understand the business, the ethics and the people behind the technology.

Carbon Aware Metrics in Software Development

The software industry is witnessing a transition from performance-at-all-costs to carbon-aware engineering. This is not about purchasing offsets to claim neutrality, it is about architectural decisions that respond to the physical reality of the energy grid.

Core to this shift is the Software Carbon Intensity (SCI) specification. By measuring emissions per functional unit, engineering teams can treat carbon as a constraint similar to latency or memory. The mechanisms for reduction are tangible:

1. Temporal Shifting: Scheduling heavy batch loads or model training when grid intensity is low (e.g. high wind/solar availability).

2. Spatial Shifting: Routing workloads to geographic regions where the current energy mix is cleaner.

3. Demand Shaping: Adjusting application fidelity based on real-time energy availability.

The convergence of FinOps and GreenOps provides the economic lever. While efficiency usually drives down costs, carbon awareness requires sophisticated orchestration to balance grid intensity against spot pricing and performance SLAs. Sustainability is evolving from a policy statement into a compile-time optimization.

Case study: How Zoom Scaled 30x in 90 Days

In December 2019, Zoom had 10 million daily meeting participants. By April 2020? 300 million. That's not growth. That's a controlled explosion.

During the COVID-19 pandemic, Zoom went from a business tool to global infrastructure overnight. Schools, hospitals, governments, families, everyone needed video conferencing simultaneously.

Built on AWS cloud architecture that could scale dynamically, Zoom expanded their security monitoring infrastructure from a handful of servers to over 250 indexers and 200,000 forwarders at peak. They added 15+ data centers in months to handle regional demand and implemented end-to-end encryption across the platform.

Security data logs grew from gigabytes per day to hundreds of terabytes per day. Zoom used Amazon EMR with Apache Hudi to ingest 150 million Kafka messages in under 5 minutes.

When security issues emerged, CEO Eric Yuan made a decision that many executives wouldn't: he paused all feature development for 90 days to focus exclusively on security. He hired new security executives, published a 90-day security plan with weekly updates, embedded security reviews in every phase of development (design, build, test) and took personal accountability.

Most companies prepare for 2x growth. Maybe 5x if they're ambitious. Zoom had to handle 30x while simultaneously fixing security issues under global scrutiny.

They succeeded because their cloud-native architecture allowed elastic scaling, cross-functional teams could make rapid decisions without bureaucratic approval chains, leadership prioritized trust over short-term feature velocity and they were transparent about challenges rather than hiding them.

The next crisis probably won't look like a pandemic, but it will require infrastructure that can scale orders of magnitude, not incrementally. It will require culture where teams can make rapid decisions. It will require leadership willing to pause revenue-generating work to fix fundamental issues. And it will require transparency that builds trust when things break.

99, 99.9, 99.99 and 99.999% SLA

The difference between availability percentages:

- 99% = 3.65 days downtime/year

- 99.9% = 8.77 hours downtime/year

- 99.99% = 52.6 minutes downtime/year

- 99.999% = 5.26 minutes downtime/year

Each additional "9" demands exponentially more investment. Redundant systems, automated failover, distributed architectures, continuous monitoring and substantially higher infrastructure costs.

The question isn't "how many nines should we target?" It's "what does downtime actually cost us?"

A payment processing system going down for even five minutes can mean millions in lost transactions and immediate regulatory scrutiny. An e-commerce platform during Black Friday can't afford 52 minutes of downtime. Healthcare systems managing patient records need near-constant availability because lives may depend on access to critical information.

But an internal HR portal experiencing 8 hours of downtime spread across a year is inconvenient, not catastrophic. A corporate blog being down for half a day won't damage the business. Development and staging environments can tolerate even lower availability without meaningful impact.

The right SLA aligns availability with actual business impact. A critical platform at 99.9% is underserving users. An internal tool at 99.999% is probably burning money on unnecessary infrastructure.

Simplicity as a Goal

In 1945, Picasso created a series of lithographs where a fully detailed bull gradually became a single continuous line. What looks like simplification is actually clarity. Removing everything that doesn't define the idea.

Leonardo da Vinci anticipated this mindset with his belief that "simplicity is the ultimate sophistication".

Steve Jobs turned that philosophy into Apple's product strategy. For him, simplicity wasn't minimalism but intention. Refining again and again until the purpose becomes obvious and the noise disappears.

These three examples point to the same lesson: simplicity is not the starting point, it is the outcome of disciplined reduction.

In technology we tend to move in the opposite direction. We add features, integrations, layers and processes. Complexity accumulates and suddenly it becomes the default. The desirable goal is to pull in the other direction, to make the problem smaller, the architecture cleaner, the product sharper and the path forward unmistakable.

Reduce, refine and reveal the essential, until what remains is both simple and undeniably true.

AI and Critical Infrastructure

A Cloudflare outage this week following AWS disruptions just weeks ago. Two backbone providers and two significant incidents in a short time.

It raises questions about the evolving role of AI in critical infrastructure teams. As these tools become more capable, companies face pressure to optimize costs by reducing headcount, including senior engineering roles.

There's a meaningful distinction between using AI to augment technical teams versus replacing experienced staff entirely. Consumer applications can tolerate different risk profiles than infrastructure services where cascading failures affect millions of businesses simultaneously.

Senior engineers bring capabilities that extend beyond code: pattern recognition, institutional knowledge about system design decisions and the ability to navigate ambiguous emergencies under pressure. These skills develop over years.

The recent outages may be completely unrelated to staffing decisions, or they might be early signals. Either way, the conversation about balancing innovation with reliability in critical infrastructure deserves attention.

Web Summit 2025 – AI beyond chatbots

This year's Web Summit was all about AI evolving from a tool for individuals to a force reshaping teams, industries and even devices.

Some of the highlights that stood out to me:

Atlassian's CEO shared how Rovo is helping teams work smarter together, a glimpse into how AI is transforming collaboration, not just productivity.

Siemens' CTO discussed how IoT and AI are making industrial systems more secure, precise, and innovative, with digital twins playing a key role in the next industrial leap.

Qualcomm's CEO painted a future where AI in wearables becomes ubiquitous and where smartphones may lose their central role in our daily lives.

Boston Dynamics' CEO showcased how AI-driven robotics is already boosting efficiency in industrial environments and hinted that service industries will be the next frontier.

It's clear that we're entering an era where AI isn't just augmenting humans but reshaping how entire systems and organizations operate.

Pioneers are the first to get shot

The Gartner Hype Cycle and the saying "pioneers are the first to get shot" tell the same story from different angles.

At the peak of inflated expectations, early adopters rush in. The technology is exciting but immature, tools are unstable, costs are high and success stories are rare. These are the pioneers. They take the arrows: technical dead ends, regulatory uncertainty or cultural resistance.

A few survive and pave the way, but most fail quietly. Then come the settlers, those who enter during the slope of enlightenment, when standards are clearer and real value begins to emerge. They build sustainably and scale efficiently.

Innovation rewards courage but timing multiplies its impact. The goal isn't to be first, it's to be ready when it matters.

Gartner Hype Cycle

The Gartner Hype Cycle is a model that helps visualize how emerging technologies progress through different stages of public perception and maturity. It illustrates how enthusiasm often precedes understanding, and how true value only emerges after initial disillusionment.

It follows five key phases:

1. Innovation Trigger - A breakthrough or proof of concept captures attention. There are few usable products, but excitement begins.

2. Peak of Inflated Expectations - Media coverage and marketing drive exaggerated hopes. Early successes are overhyped, while failures are ignored.

3. Trough of Disillusionment - Reality sets in. Implementations fail, investors pull back, and public interest fades.

4. Slope of Enlightenment - Lessons from earlier failures lead to realistic improvements and clearer business cases.

5. Plateau of productivity - Mainstream adoption starts to take off.

Take generative AI as example. In 2023, tools like ChatGPT triggered massive hype, promises of automation, creativity and transformation reached the peak of inflated expectations. Many organizations rushed in without clear use cases, leading to disappointment when results fell short. Now, the industry is entering the slope of enlightenment, where more focused applications such as code assistants, customer support bots and document summarization are delivering tangible productivity gains.

The Gartner Hype Cycle is less about predicting winners and more about managing expectations and timing adoption wisely. Smart leaders use it to decide when to experiment and when to scale.

More about Gartner.

From Contributor to Builder

The transition from hands-on contributor to team builder is one of the hardest shifts in a technical career. What once depended on personal skill and output now depends on enabling others to perform at their best.

Effective leadership is not about directing, it's about creating the conditions for excellence. That means combining technical wisdom with empathy and the ability to set both challenge and safety. Building psychological safety doesn't mean lowering standards, it means understanding that innovation requires risk and that risk occasionally leads to failure.

On the other hand, trust is not declared, it's accumulated and built through consistent recognition, transparency and shared experiences that go beyond surface-level interaction. Structured alignment mechanisms like OKRs reinforce that trust by turning intent into clarity and measurable outcomes and reducing ambiguity about definition of success.

Leadership in this context is a balance: technical enough to guide, human enough to inspire and structured enough to sustain. The real work is not just building systems, it's building people who can build systems together.

When the Cloud Fails

Today's AWS outage reminds us of an uncomfortable truth: our digital systems rests on invisible dependencies. A single misconfiguration, a network disruption or a cascading failure in a major cloud provider can send shockwaves across the globe halting both startups and giants alike.

Critics will point to these incidents as proof that we've become too dependent on centralized infrastructure. Yet, paradoxically, the cloud remains the most reliable option we've ever had. Outages are spectacular because they are rare and amplified by the scale of what they power. Before the cloud, similar failures happened silently in server rooms. Hardware would fail, backups would break and disaster recovery plans often existed only on paper.

Cloud computing didn't eliminate failure, it industrialized resilience. It brought redundancy, automated failover and economies of scale that few individual companies could ever achieve on their own. But also brought systemic risk.

The lesson isn't to abandon the cloud, but to account for failure. Diversify across regions or even providers. Test your backups and assumptions, because reliability isn't a feature of the cloud, it's a discipline.

Enshitification

Cory Doctorow, the writer who coined the term "enshitification", argues that Amazon has finally reached the terminal stage of the process he first described years ago: the point where a platform stops serving users or partners and instead optimizes entirely for itself.

In its early years, Amazon built loyalty through low prices, fast delivery, and generous customer policies. Later, it courted sellers with visibility and scale, creating a powerful cycle of growth that reinforced itself with every new customer and merchant. But over time, that cycle turned inward. Search results became pay-to-play. Merchants faced rising fees and were nudged into costly fulfilment programs just to remain visible. Consumers were left with degraded search quality, counterfeit products, and creeping prices hidden behind the convenience of Prime.

Doctorow describes this as the final phase of enshitification: the slow transformation of a useful system into one that extracts more than it gives. It is no longer innovation or competition that drives Amazon's behaviour, but the maintenance of dominance through lock-in and algorithmic opacity.

His warning goes beyond Amazon itself, it's about what happens when digital ecosystems stop competing for trust and start competing for control. When efficiency, scale, and data concentration become ends in themselves. As Doctorow puts it, enshitification is not inevitable, but reversing it will require regulation, transparency and the courage to reimagine what a fair digital marketplace looks like.

Cory's article here.

The Cost of Always-On Leadership

In modern organizations, availability is often mistaken for commitment. Leaders feel pressure to be constantly reachable, replying instantly, joining every meeting, staying online late. It signals dedication but quietly erodes effectiveness.

When every moment is spent reacting, there's no space left for reflection or strategy. Over time, leaders become exceptional at responding but poor at thinking.

Decades ago, "The One Minute Manager" by Ken Blanchard and Spencer Johnson, offered a timeless idea: leadership isn't about constant presence, but focused presence. Short, intentional moments that bring clarity and direction. The lesson holds even more true today when noise never stops.

A leader who is always online is rarely fully present. True leadership requires deliberate absence, time to think, to learn, to rest and to return with focus.

From Data Collection to Data Understanding

Over the last decade, companies have invested heavily in data pipelines, data warehouses and dashboards. Data collection became an obsession, but collecting data isn't the same as understanding it. In many cases, more data simply created more noise and more confusion about what's actually true.

The real challenge is not technical, it's interpretive. Turning data into understanding requires clarity of purpose: knowing which questions matter, which signals are meaningful and which metrics reflect progress rather than activity. Without that clarity, teams end up optimizing for what's easy to measure instead of what's important.

As an example, dashboards might show increasing engagement metrics, suggesting success. But without context, it's unclear whether users are genuinely finding value or just getting stuck in loops that inflate numbers. The data looks good, but the story it tells is misleading.

The shift from collection to understanding starts with alignment. Data teams need to work closely with business and product leaders to define intent. Data quality matter, but so does narrative.

Data-driven organizations are not the ones that collect the most information. They are the ones that ask better questions, interpret data with judgment and act with discipline.

The Half-Life of Technical Skills

Technical knowledge has a half-life. Over time, its value decays, and what once felt essential can quickly become obsolete. The pace of that decay is accelerating. What used to take a decade now happens in just a few years—or faster in fields like AI.

Not long ago, Hadoop was the centerpiece of every big data strategy. Entire teams were built around it, certifications were in high demand, and knowing how to manage clusters was a prized skill. Today, very few organizations are investing in Hadoop. The ecosystem shifted toward cloud-native, serverless, and real-time approaches, leaving those who never moved beyond Hadoop struggling to stay relevant.

This is the reality of working in tech: the real skill is not mastering a tool once, but mastering the ability to learn, unlearn and relearn continuously. Past expertise has value but relying only on it is dangerous. What matters is how quickly we adapt when that expertise no longer applies.

Leaders have a responsibility here too. Staying curious themselves is only half the job. They also need to create environments where learning is not an afterthought but part of the daily rhythm. Teams that treat learning as a continuous process don't just keep up with change, they thrive on it.

Data Dispersion in Growing Organizations

As organizations scale, one of the recurring challenges is the phenomenon of data dispersion. What begins as a centralized system of record such as an ERP, CRM, or core database, gradually fragments into departmental silos.

Teams, under pressure to deliver quickly, often bypass central governance by creating local datasets, exporting information into spreadsheets, or adopting specialized SaaS tools. While these decisions may resolve immediate needs, they create long-term complexity. The outcome is duplication and inconsistency, with multiple versions of the truth spread across sales, marketing, and operations. Shadow data sets, maintained outside of IT oversight, accumulate over time and gradually undermine trust in the organization's information landscape.

The consequences are predictable: reconciliation processes become slow and expensive, data quality declines, and operational as well as compliance risks increase. The drivers are equally familiar: the tension between speed and control, the proliferation of tools, and the absence of effective governance.

Addressing data dispersion requires deliberate action. Organizations must define governance frameworks with clear ownership, establish master data management practices to ensure a single source of truth, and connect silos through integration approaches such as data fabrics or meshes. Just as importantly, they should provide self-service capabilities with the right guardrails so that teams can move quickly without creating new fragmentation.

Ultimately, data dispersion is not only a technical issue but also a governance, cultural, and strategic challenge.

Leadership lessons from The Sun Also Rises

Hemingway's The Sun Also Rises is not a book on leadership. It is a story of disillusionment, of a generation wounded by war, drifting between Paris and Spain in search of meaning. But, when read through the lens of leadership, the novel offers profound lessons.

The protagonist, Jake Barnes, is not a leader in any formal sense. He does not command authority, nor does he impose his will. Yet his quiet integrity, his ability to remain composed amidst chaos, and his authenticity make him a point of reference for others. In contrast, the rest of the group often becomes lost in ego, conflict, and the endless pursuit of distraction.

Hemingway reminds us that leadership is not always about titles or formal structures. It is about presence, about giving direction when others are adrift, and about managing the complex human dynamics that arise in any team. The absence of purpose in the so-called "lost generation" underlines the necessity of vision: without meaning, people drift apart.

Perhaps most striking is the novel's use of bullfighting as metaphor. In the ring, courage, discipline, and dignity under pressure stand in stark contrast to the confusion of the expatriates. The message is clear: leadership is less about control and more about facing uncertainty with honesty, resilience, and grace.

Lean and AI

Implementing AI effectively goes beyond deploying sophisticated models, it's about learning swiftly and delivering measurable value. Lean experimentation provides a structured approach to achieve this.

A notable example is Johnson & Johnson, which recently recalibrated its generative AI strategy. Initially, the company supported nearly 900 AI use cases across various functions. However, upon evaluation, they discovered that just 10-15% of these initiatives accounted for 80% of the value. This insight led to a strategic shift towards high-impact areas such as drug discovery, supply chain optimization, and internal support tools like the "Rep Copilot", an AI-driven assistant aiding sales representatives in engaging healthcare professionals.

The company has shifted its generative AI strategy from broad experimentation to a focused approach, prioritizing only the highest-value use cases while cutting projects that are redundant, ineffective, or better served by other technologies.

This shift highlights a critical lesson: experimentation alone is not enough. Success comes from focusing on initiatives that generate real impact, learning rapidly from each iteration, and strategically applying AI where it can truly transform outcomes. It's a pragmatic approach that balances innovation with measurable value.

Source: Johnson & Johnson Pivots Its AI Strategy

AI Code Agents: Divide and Conquer

Code agents such as Cursor or Claude Code tend to produce more reliable results when working on small, well-defined tasks rather than large, open-ended projects. A request like “build a function that uploads an image to S3” is specific, measurable, and can be executed within the model's reasoning window. In contrast, a request such as “build an Instagram clone” is too broad, underspecified, and quickly exceeds the agent's ability to maintain coherence.

The difference lies in scope and ambiguity. Large requests bundle together dozens of design decisions, architectural choices, and interdependent features. They create opportunities for error to propagate and for the model to contradict earlier assumptions. Smaller tasks reduce complexity, minimize dependencies, and offer a clear standard for success.

This mirrors established software development practices. Complex systems are not built in a single step but decomposed into smaller units of work, each with its own acceptance criteria. Code agents follow the same principle: they thrive on precision and iteration.

The practical takeaway is to approach code agents the way you would structure a development backlog. Break down big ideas into discrete, testable tasks. The narrower the scope, the more effective and dependable the output.

Data Gravity

Dave McCrory first introduced the concept of Data Gravity in 2010. He used the very intuitive and convincing analogy: data is similar to a planet, gaining mass when it grows and drawing applications, services, and additional data into its orbit.

Over the years, this metaphor has become much more than a technical observation. It has turned into a strategic principle that shapes how organizations think about technology. As data accumulates, it begins to determine where applications are built, how systems interact, and which ecosystems companies inevitably become tied to.

The implications for IT strategy are significant. The physical and regulatory location of data influences decisions about cloud adoption, on-premise investments, and hybrid or multi-cloud approaches. The cost and complexity of moving large datasets often lead to platform and vendor lock-in. Latency requirements drive applications to reside closer to the data they consume. And increasingly, laws around privacy and data residency add another dimension to this gravitational pull.

In practice, data is no longer just a resource to be managed, it acts as the anchor point around which the rest of the technology landscape must orbit. Forward-looking organizations recognize this and design architectures that balance agility with the realities of immovable data, whether through distributed models, edge strategies, or careful governance frameworks.

Understanding and planning for data gravity not only helps avoid technical bottlenecks, but also positions companies to turn data into a competitive advantage.

Are you sure that "pattern" you see is real?

Human brains are superb pattern-recognition machines. That's why we invent, problem-solve, and recognize threats. Occasionally, however, this asset becomes a liability.

Apophenia is the predisposition to perceiving meaningful patterns in randomness. It forms the basis for typical biases including:

- Gambler's fallacy: believing a roulette wheel is “due” to change.

- Clustering illusion: misinterpreting random clusters for true trends.

- Filters and confirmation bias: only perceiving the data supporting our views.

In business and technology, this matters. Mistaking noise for signal leads to wasted resources, poor strategy, and misplaced confidence.

Poor decisions often arise when random data is mistaken for meaningful patterns, such as reading financial market noise as predictable cycles, treating spurious correlations as actionable insights, or assuming isolated events reveal systemic truths.

Pattern recognition is powerful, but only when we challenge it with discipline and skepticism.

From APIs to Agent-to-Agent (A2A)

APIs have long been the backbone of digital finance, enabling structured communication between systems. But as financial interactions grow more complex, static calls are no longer sufficient. This is where A2A (agent-to-agent) comes in, a concept that has gained traction recently in discussions at industry forums.

A2A describes autonomous communication between intelligent agents that represent institutions, companies or even individual contracts. These agents can negotiate conditions, validate compliance and execute decisions in real time, while maintaining a fully auditable record.

Take a cross-border payment as an example. A company in Portugal needs to transfer €50,000 to a supplier in Brazil. Its agent queries multiple providers: the bank, a fintech payment service, and an alternative network. Each responds with exchange rates, fees and settlement times. The company's agent evaluates the options and executes through the most efficient route, all without manual intervention.

Rather than replacing APIs, A2A builds on them, pointing toward a future where financial systems interact dynamically, adapting continuously to context, cost and regulation.

As the concept evolves, its practical applications will likely define the next stage of automation in financial services.

AI Vanity Metrics

AI adoption in software development is accelerating, but so is skepticism among developers. Management often celebrates impressive numbers, but many engineers see through these vanity metrics that look good in presentations but don't reflect the reality of building software.

Counting lines of code generated by AI is a common example. Management may report a large percentage of code written by AI, but more code is not necessarily better. AI can produce verbose, redundant or buggy solutions, increasing maintenance costs.

Ticket closure rates and sprint velocity are often cited. Closing more tickets does not guarantee that the right features are delivered or customer problems are solved.

Claims about time saved per developer can also be misleading. The time saved is often spent debugging or rewriting AI-generated code, reducing actual benefits. Similarly, adoption rates and flashy demos can look impressive without proving real value or scalability.

Better indicators of AI's value include bug rates, avoided defects, code review burden, and overall cycle time from idea to production. Security issues, maintainability, scalability and technical debt are also critical, as is developer satisfaction. If engineers are productive, creative and supported, AI adoption is genuinely adding value, if not, the tools are failing.

Developers resist when AI adoption is justified by meaningless numbers and when hidden costs like review time, debugging and technical debt are ignored. As one developer wrote in Reddit: "AI sort of becomes a management tool, not a developer tool". Vanity metrics create friction by making AI a selling proposition rather than a productivity boost.

AI can significantly augment development, but only if companies measure meaningful outcomes. Leaders should ask whether AI improves reliability, reduces time-to-market, addresses customer needs and expectations and genuinely boosts developer productivity.

Shifting from vanity metrics to actionable metrics is essential. Only then can AI move from hype to genuine impact.

Small Batches

In the book "The Lean Startup", Eric Ries presents a compelling argument for working in small batches, an idea that seems simple on the surface but has far-reaching implications for how we approach technology and operations.

The principle is straightforward: instead of building large, complex systems or features in one go, break the work into small, testable, and releasable chunks. Ship early. Learn fast. Repeat.

To illustrate this, Ries uses an example from a traditional office setting. Imagine you need to send out 100 newsletters. One approach is to fold all 100, then stuff all 100 into envelopes, then add stamps to all 100. The other approach is to fold, stuff, and stamp one newsletter at a time. The second method, despite seeming slower, is almost always faster overall. Why? Because problems are identified sooner (a misfit envelope, a missing component), and the process becomes more efficient with real-time learning and iteration.

This logic applies directly to IT. Working in small batches allows you to:

- Deliver software incrementally through Agile methods

- Test and deploy frequently via CI/CD

- Make controlled, reversible infrastructure changes

- Detect and resolve issues quickly due to a smaller change surface

Small batches create faster feedback loops, reduce risk, and encourage continuous improvement. They help teams stay aligned, deliver value sooner, and adapt to uncertainty with more confidence.

By contrast, big batch approaches delay learning, compound complexity, and increase the likelihood of failure.

Whether you're writing code, managing infrastructure, or launching new products, adopting a small batch mindset can lead to better outcomes across the board.

Why a Great Tech Lead Doesn't Have All the Answers

There's a widespread belief that a tech lead needs to have all the answers, but in reality, the best tech leads know that their strength lies in asking the right questions, not in being the sole problem-solver.

A tech lead's role goes beyond technical decisions, it's about guiding the team, aligning with the business, and fostering collaboration across departments. Trying to provide all the answers often stifles team creativity and slows growth. When the lead always jumps in with a solution, it creates dependency and discourages others from thinking critically. But when a tech lead says, "I don't know - what do you think?", they empower the team to take ownership and grow their problem-solving skills.

It also prevents burnout. Carrying the weight of every decision is unsustainable and pulls a lead away from strategic thinking. Sharing responsibility and trusting the team builds a more resilient, innovative culture.

Ultimately, a great tech lead isn't the one who knows everything. It's the one who listens, guides, and creates space for others to thrive. Leadership isn't about having all the answers, it's about helping the team find them together.

The Backlog Is Not a Dumping Ground

In Agile teams, the backlog is supposed to be a clear, focused list of work that drives real value. But more often than not, it ends up as something else entirely different: a dumping ground for every idea, request or feature anyone has ever mentioned.

It usually starts like this: someone says "let's just throw it in the backlog", over time, this becomes the norm. Every suggestion, every edge case, every feature that might be useful someday gets logged and forgotten. No one knows what's in there anymore, no one is sure what matters.

This isn't just a mess, it's toxic to productivity. When the backlog becomes unmanageable, the team loses focus, decision-making slows down, planning gets harder, developers pick up stories that lack clarity or purpose and stakeholders feel ignored because their input disappears into a black hole. And the worst part, it becomes impossible to tell what is truly important.

Agile is built on the ability to respond to change - but ironically, a cluttered backlog makes that harder. Teams become less confident in adjusting direction because the backlog offers no guidance. It's just noise.

Backlog grooming should be a regular and collaborative process, not a solo admin task. It's where clarity is created and bad ideas die, so that the good ones can move forward.

If your team starts treating the backlog as a tool for focus, not storage, everything changes. Planning gets easier, prioritization becomes clearer and Agile starts to feel like it should.

Agile And Mini Waterfalls

We say we're doing Agile. We run sprints, estimate story points, hold stand-ups… but are we actually being Agile?

Too often, Agile is reduced to a checklist of rituals. What we end up with is a bunch of mini waterfalls: design > dev > test, all squeezed into a sprint. Locked scope. Delayed feedback. Progress measured by ticket completion, not value delivered.

Agile isn't about speed. It's about learning fast, adapting often, and delivering what matters.

If your team can't clearly explain why they're building something, or if change feels like a disruption instead of an opportunity, you're not being Agile, that's just process without purpose.

Five Whys

The Five Whys method is a simple but powerful tool used to identify the root cause of a problem by asking "why?" five times in succession. It was originally developed by Sakichi Toyoda and became a foundational part of the Toyota Production System and lean manufacturing practices.

The process starts with a clear statement of the problem. From there, you ask "why did this happen?" and then continue asking "why?" for each answer you get. The idea is that each answer brings you closer to the underlying cause, not just the surface symptom. You usually reach a root cause by the fifth "why" but it could take more or fewer steps depending on the situation.

Example: The car won't start.

1. Why? - The battery is dead.

2. Why? - The alternator isn't working.

3. Why? - The alternator belt is broken.

4. Why? - The belt was worn out and not replaced.

5. Why? - The car wasn't maintained on schedule.

Root cause: Poor maintenance practices.

The Five Whys is best for straightforward problems. For more complex ones, it's often combined with other tools, but on its own, it's a great way to get past quick fixes and understand what really needs to change.

Backup

"A backup isn't a backup unless you've tested it."

Startup Productivity

"Startup productivity is not about cranking out more widgets or features. It is about aligning our efforts with a business and product that are working to create value and drive growth.” Eric Ries - The Lean Startup

In traditional companies, productivity is often measured by output, how many features are built, lines of code written, or products shipped. But Ries is arguing that for startups, this kind of raw output doesn't necessarily mean progress.

Startups operate under extreme uncertainty, so building a feature that nobody uses, or improving a product that doesn't solve a real problem, is a waste of time and money, even if the team was “productive" in a traditional sense.

Before building something, ask:

- Does this help us validate a hypothesis?

- Will it improve the customer experience in a meaningful way?

- Is it aligned with what's actually driving traction?

If the answer is no, it might be productive activity, but not productive progress.

Delegation and the Eisenhower Matrix

The desire to stay on top of everything often leads to a common trap: doing too much, too personally. True leadership isn't about doing it all, it's about doing what matters most and maximize impact. This is where delegation becomes not just a skill, but a strategic necessity.

The Eisenhower Matrix offers a simple framework to help leaders decide what to act on, what to plan, what to delegate, and what to eliminate. It categorizes tasks along two axes: urgency and importance. Tasks that are both urgent and important should be addressed directly. Important but not urgent tasks like strategic thinking and team development should be scheduled and protected. But the most overlooked quadrant, especially by leaders, is the one filled with tasks that are urgent but not important. These are the perfect candidates for delegation.

Effective delegation isn't about offloading work, it's about multiplying impact. When leaders hold onto everything they become bottlenecks. Delegating frees up time for higher-level thinking while creating growth opportunities for team members. It builds trust, accountability, and resilience within a team.

Of course, delegation requires more than simply handing something off. It means providing clarity on expectations, outcomes, and authority. It also requires letting go of perfectionism and trusting others to deliver, sometimes differently, but often just as well, or even better.

The 4 Types of Organizational Culture

Walk into any organization, and you'll start to feel it immediately: the way people talk to each other, how decisions are made, what gets celebrated, what's considered normal. That's culture. It's not written on walls or in policy documents, it's in the rhythm of how things actually get done.

While every workplace is unique, decades of research suggest that most cultures fall into one of four generic types. Each type has its strengths, its tensions, and a distinct way of shaping how people work.

Clan Culture

In a clan culture, the workplace feels more like a community - or even a family. Relationships matter. Teamwork and loyalty are at the core. Leaders act as mentors or coaches rather than commanders. Feedback tends to be open and informal, and decisions are often made through collaboration rather than hierarchy.

These organizations prioritize internal cohesion and long-term development. You're likely to hear conversations about trust, shared values, and growing people from within. This type of culture thrives in settings where stability comes from connection rather than control - like early-stage startups, nonprofit organizations, or companies with a strong people-first ethos.

Clan culture isn't just about being "nice". It's a deliberate strategy: when people feel seen and supported, they often perform at their best. The challenge comes when a business starts to scale or compete aggressively - without structure or sharp decision-making, things can stall.

Adhocracy Culture

If you're working in a place where every week brings a new experiment or bold idea, you're likely in an adhocracy culture. These environments thrive on innovation, agility, and a healthy dose of risk-taking. Speed matters. So does originality.

You won't find much bureaucracy here - processes are lightweight, and roles may be fluid. The emphasis is on creating new things, whether that means breakthrough products, disruptive business models, or experimental ways of working. Leadership is visionary and entrepreneurial, and failure is seen not as a threat, but as part of the creative process.

Adhocracy cultures are often found in fast-moving tech firms, design agencies, or forward-thinking R&D teams. They're magnetic for creatives and problem-solvers, but they can burn people out or collapse under their own chaos if not balanced with some structure and clarity.

Market Culture

In a market culture, success is defined by performance. This is the world of targets, KPIs, competition, and relentless execution. It's outward-facing - customers, competitors, and market share are top of mind. The language is about outcomes, goals, and winning.

Leadership here is strong and decisive. People are rewarded for what they achieve, not just how they get there. There's often a sense of urgency and accountability. These organizations tend to scale well and dominate markets - but the tradeoff can be high pressure and less focus on internal cohesion.

Think of sales organizations, consulting firms, and large enterprises competing at a global level. Market cultures can drive incredible results - but they need to ensure that people don't feel like cogs in a machine.

Hierarchy Culture

Then there's hierarchy culture: structured, formal, and process-driven. Here, the organization is built for consistency, efficiency, and risk management. Rules, roles, and procedures are clearly defined. Success is about doing things right - on time, within scope, and according to plan.

Hierarchy cultures are often found in government institutions, hospitals, large manufacturers, or any organization where predictability and control are crucial. Leadership is managerial - focused on coordination, performance monitoring, and smooth operations.

The advantage of this culture is reliability: things work, and risks are minimized. The downside is that it can become resistant to change or innovation unless there's conscious effort to create space for new thinking.

These four culture types don't just describe companies in theory. They explain real tensions we see every day: people vs. profit, innovation vs. stability, collaboration vs. competition. No culture type is "best". Each one can thrive or fail depending on the context and leadership.

These types stem from a model called the Competing Values Framework (CVF) by Robert Quinn and John Rohrbaugh. It maps organizations across two dimensions: one that contrasts flexibility vs. stability, and another that contrasts internal focus vs. external focus. The intersection of those tensions gives rise to the four culture types.

But culture is rarely static. Many organizations shift over time - or blend elements from more than one quadrant. A fast-growing startup may begin as a clan but gradually adopt market-driven traits. A hospital may primarily function as a hierarchy but foster a clan dynamic within care teams.

Communication breakdown

When major systems go down, we often point to a misconfigured script, a faulty deployment, or a routing error. But if we look more closely, the real culprit is often much more fundamental: a breakdown in communication.

In the past few years, we've seen this pattern play out in high-profile outages at Atlassian, Facebook, and Slack - companies known for engineering excellence. Despite having top-tier infrastructure, each of these organizations faced cascading failures made worse by internal misunderstandings, unclear responsibilities, or missing escalation paths.

In April 2022, Atlassian experienced a severe outage that affected over 400 customer sites - some for nearly two weeks. The root cause wasn't a novel technical failure, but a routine deletion script that was misunderstood. The parameters were wrongly set, and different teams had differing assumptions about the scope and safety of the operation. Without a clear handover or validation process, live customer environments were accidentally taken down. The recovery effort was slow, not due to lack of expertise, but because the communication channels and documentation weren't aligned to handle such a situation swiftly.

In October 2021, Facebook went dark globally for six hours. The trigger was a change to network routing configurations, but the real issue was the lack of internal alignment on failure scenarios. The update removed Facebook's services - including internal tools - from the internet. With critical systems offline, teams couldn't communicate, access internal dashboards, or even enter the data centers. A single misjudged assumption about rollback procedures and internal tool independence turned a manageable change into a complete operational paralysis.

Slack's January 2021 outage followed a similar theme. A misconfiguration in internal traffic routing triggered widespread degradation just as the world was returning from the holidays. During the incident, different engineering teams held conflicting mental models of what was failing. This misalignment led to duplicated effort, delayed diagnosis, and inconsistent messaging to customers. The systems were complex - but the real challenge was creating a shared understanding fast enough to respond effectively.

These incidents show us that technical excellence alone isn't enough. In complex, fast-moving environments, the quality of internal communication - before, during, and after an incident - is what determines resilience. Systems fail. What matters is how we talk to each other when they do.

Communication is infrastructure.

Observability vs Monitoring

There's a lot of talk about observability these days, and it's easy to confuse it with monitoring. But the difference really matters - especially as systems get more complex.

Monitoring is about tracking known things. You define metrics and thresholds, set up alerts, and wait to be told when something breaks. It's reactive. You already know the kinds of problems you're looking for, and you build tools to catch them.

Observability, on the other hand, is about answering questions you didn't know you'd need to ask. It's about understanding how your systems behave, diagnosing the unexpected, and making smarter decisions based on real data - not assumptions. It's a proactive and exploratory approach to understand the internal state of a system by examining its external outputs.

In practice, observability is about putting the right structures in place to see and understand what your systems are doing - all the time, not just when things go wrong. You start by identifying which metrics actually matter for your business and your users - like response times, error rates, or system throughput. Then, you instrument your code to capture useful logs, traces, and metrics. Tools like OpenTelemetry can help make that process more consistent.

From there, you build dashboards that highlight what's important and set alerts that trigger on real issues, not noise. Popular tools like Prometheus, Grafana, and the ELK stack make this possible, and platforms like Datadog or New Relic can bring everything into one place.

But tools alone aren't enough. Observability has to be part of how the team works. That means using data in retros, reviewing patterns after incidents, and making decisions based on what's actually happening in your systems - not just what you hope is happening.

When observability is done right, your team detects and solves problems faster, your systems run more reliably, and decisions get made with more confidence. You spend less time guessing and more time improving. And instead of reacting to issues, you start anticipating them - and building better systems because of it.

Observability isn't about collecting more data or spinning up endless dashboards. It's about clarity. It's about helping your team ask better questions, spot issues early, and stay aligned with what really matters - both technically and to the business.

Technical Debt

Technical debt is inevitable, but manageable. When left unchecked, it doesn't just affect your codebase, it affects your people, delivery, and business.

Technical debt gradually erodes team productivity and slows down development cycles, making it harder to ship features or iterate quickly. As the codebase grows, the likelihood of bugs and defects increases-undermining product quality and user trust. Delivery timelines stretch, and the time-to-market for new features suffers.

Beyond the technical realm, debt can lead to stakeholder frustration, especially when delays or instability affect customer experience. It raises maintenance costs and diverts resources from innovation, reducing your ability to adopt new technologies or scale effectively. Over time, this misalignment between business goals and technical reality introduces risk-whether through security vulnerabilities, platform limitations, or strategic inflexibility.

Addressing technical debt proactively is essential to maintain agility, reduce operational drag, and keep the focus on building value.

There are different types of Technical Debt:

- Dependencies: Outdated or hard-to-maintain tools/libraries.

- Patterns: Poor design choices that cause recurring issues.

- Redundancies: Duplicated logic or fragmented systems.

- Abstract Elements: Unclear goals or shifting requirements.

- Legacy Templates: Inefficient scaffolding holding teams back.

- Concept Debt: Building unused or unnecessary features.

And different strategies to tackle and prevent it:

- Automate and update dependencies.

- Prioritize refactoring and enforce design reviews.

- Audit and consolidate duplicated components.

- Align abstract ideas with concrete business value.

- Modernize outdated templates and practices.

- Validate feature ideas early-only build what matters.

- Use agile workflows to identify issues early.

- Invest in code quality (reviews, pair programming, static analysis).

- Keep teams trained and current.

- Foster cross-team alignment.

- Design for change-anticipate growth.

- Schedule time to refactor and clean up.

Technical debt isn't just a tech issue. It's a business issue. Managing it well is a competitive advantage.

Lessons from Working with LLMs

I've been actively exploring large language models (LLMs) and chatbots, particularly since the release of DeepSeek. Working with them both in the cloud and locally, I've applied them to various scenarios-from software development and finance to mentoring, data analysis, and even travel planning.

Recently, I was analyzing a very large dataset and ran into a roadblock: the file size was simply too large for the LLM to process effectively. I tried several approaches-cleaning, transforming, compressing, and even splitting the data into smaller chunks. In essence, I was adapting my problem to fit the tool.

Then, in one of my iterations, something unexpected happened. The LLM itself suggested that perhaps I was using the wrong tool for the job. And it was right. I was so focused on making the problem work within the constraints of an LLM that I overlooked more suitable solutions.

It was a classic case of "when all you have is a hammer, everything looks like a nail." This experience was a great reminder that while LLMs are incredibly powerful, they are not a one-size-fits-all solution. Choosing the right tool for the task is just as important as understanding the problem itself.

Conway's Law

"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure."

This principle is more than just an observation; it's a strategic insight. The way teams are structured and communicate within an organization profoundly impacts the systems, products, and services they deliver.

Team organization is an ongoing process that must evolve with the business. Revisiting and refining team topologies is essential to maintaining alignment and achieving success.