When Assumptions Fail: Architecture and the Reality of Resilience
Executive Summary:
Most resilience failures are not caused by poor design, but by assumptions that persist long after the conditions that made them reasonable have changed. When disruption occurs, systems behave predictably according to their architecture, dependencies, and constraints, often exposing gaps between documented intent and operational reality. This essay examines why resilience must be demonstrated through observable behavior under stress, not asserted through maturity or compliance.
Most failures that appear “unexpected” in the boardroom are not surprises in the system itself. They are surprises in the assumptions embedded within it. Dependencies assumed to be isolated prove tightly coupled. Manual steps assumed to function under pressure fail to scale. Recovery paths presumed to exist reveal themselves to be theoretical rather than executable.
In complex environments, assumptions do not erode gradually. They fail abruptly, and often simultaneously. What matters, then, is not whether a program appears mature, but whether the architecture of the environment, and the operating model governing it, can absorb stress without cascading into avoidable harm.
I. Assumptions Are Invisible Until They Break
Organizations do not operate on policies alone. They operate on patterns. Under normal conditions, those patterns often appear adequate because the environment is forgiving. Exceptions are granted, tickets are worked, and individuals compensate for friction in informal ways. A control that is merely sufficient during periods of calm can appear indistinguishable from a well-designed one.
Stress removes that forgiveness. Time compresses, priorities collide, and tradeoffs become unavoidable. In high-hazard domains, this dynamic is well understood. High-reliability organizations are not defined by confidence or documentation, but by their ability to detect drift, respond to weak signals, and learn before disruption forces the lesson again. Research from aviation and healthcare has long emphasized that reliability emerges from behavior under pressure, not from stated intent alone.
The implication for financial services, and for any large, digitally mediated enterprise, is that resilience cannot be asserted. It must be demonstrated in the places where assumptions tend to hide, in handoffs, dependencies, privilege boundaries, runbooks, and the operational reality of how work is actually performed.
This distinction clarifies the limits of maturity assessments. Such assessments have value, particularly when programs are immature. They provide shared language and help establish baseline capability. Yet they can also become a proxy for comfort. In sufficiently complex environments, it is possible to score well while remaining fragile in precisely the conditions that matter most.
The difference becomes clear when comparing the existence of a recovery plan with the ability to recover under real constraints. That difference is not documentation. It is architecture, and the incentives and governance mechanisms that sustain it over time.
II. Systems Behave Predictably Under Stress
Resilience discussions often gravitate toward certainty. Leaders seek binary answers: whether an organization is safe or exposed, resilient or fragile. Complex systems resist such framing. They behave in gradients and patterns, responding to stress according to their structure, coupling, sequencing, and constraints.
Under pressure, systems do not adapt to intent or effort. They behave according to how they were built. This is not pessimism. It is structure. It also explains why incidents that appear novel so often resemble one another. Cascading failures, tight coupling, and hidden interactions are not anomalies. They are properties of complex systems operating under stress.
Decades of scholarship reinforce this view. Charles Perrow’s Normal Accidents demonstrated that in tightly coupled, complex systems, certain failures are not aberrations but expected outcomes. While the terminology predates modern digital infrastructure, the underlying dynamics remain relevant across technology-enabled enterprises.
This reality shifts the resilience conversation beyond prevention alone. Prevention remains necessary, and mature organizations invest heavily in it. Yet as environments become more digitized, integrated, and interdependent, risk posture increasingly depends on behavior when preventive controls do not hold. That orientation reflects scale, not resignation.
Regulatory thinking in financial services has moved steadily in this direction. Operational resilience regimes emphasize outcomes under stress rather than control completeness alone. The focus on important business services, impact tolerances, dependency mapping, and scenario testing — requirements codified in the UK FCA’s PS21/3 policy statement — reflects a supervisory recognition that resilience is ultimately revealed through performance, not posture.
III. Dependency Blindness Is the Root Cause
Dependency blindness does not stem from a lack of inventories. It stems from the assumption that inventories equal understanding. Many organizations can list major vendors, critical applications, and primary cloud providers. Far fewer can articulate second-order dependencies, shared subcomponents, and the nested service chains that bind seemingly independent services together.
This dynamic explains why large-scale outages often appear to take disproportionate portions of the digital economy offline. The cause is rarely a uniquely fragile provider. More often, it is concentration and common-mode dependency quietly embedded into modern enterprise architecture.
From a financial stability perspective, this is not a narrow technology concern. It reflects a broader pattern of systemic risk arising from interconnectedness. International bodies have repeatedly warned that operational disruptions can propagate across firms and markets through shared dependencies, even when individual institutions appear well managed in isolation — a pattern the FSB has documented directly.
The appropriate response is not withdrawal from third-party ecosystems or cloud services. It is disciplined dependency mapping treated as a living practice rather than a static artifact. When disruption occurs, the relevant question is not whether monitoring exists, but whether the organization understands what fails first, what fails next, and what remains operable when assumptions about availability prove incorrect.
Governance failures often compound the problem. When no single function owns dependency chains end-to-end, incentives favor local optimization. Product teams ship features, procurement negotiates cost, security defines requirements, risk authors policy, and operations absorbs the consequences. In steady state, this fragmentation can persist. Under stress, it becomes visible and costly.
IV. Architecture Determines What Organizations Can Rely On
Resilience analysis ultimately must engage with architectural reality. What an organization can rely on during disruption is determined less by stated intent than by structural design. Recovery paths, authority, and containment mechanisms reflect how systems are coupled, where dependencies converge, and which controls remain functional under constraint.
Architecture is the point at which aspiration becomes outcome. Structural decisions, often made incrementally and locally, shape whether adaptation is possible or brittle when conditions deteriorate. This is why resilience cannot be treated as a finite project. It must be managed as a discipline that evolves alongside the environment it governs.
Insights from resilience engineering reinforce this view. The same adaptive behaviors that produce success in normal operations can also generate failure under stress. The objective is not to eliminate adaptation, but to ensure it is bounded, observable, and improvable — a discipline resilience engineering addresses directly.
This framing allows resilience discussions to become concrete. Questions shift from abstract assurance to architectural capability. Where do single points of failure persist? Which recovery paths depend on privileged access available only in the primary environment? Which manual controls implicitly require human performance that degrades under pressure?
Frameworks can support this work when used judiciously. Shared taxonomies such as the NIST Cybersecurity Framework facilitate cross-functional dialogue and capability assessment. Yet the framework itself is not the destination. The destination is observable performance under constraint.
The Resilience Operating Model reflects this orientation. Rather than adding additional layers of compliance language, it emphasizes alignment among governance structures, operating cadence, and technical realities. The aim is to make system behavior visible, testable, and improvable under stress.
V. What Leaders Should Expect Instead
Once it is accepted that assumptions fail and architecture constrains behavior, leadership expectations necessarily change. The relevant question is no longer how mature a program appears, but what can be demonstrated under stress, and what evidence would warrant confidence or concern.
For boards and executive committees, this shift sharpens oversight. Effective governance does not demand exhaustive dashboards. It demands clarity around a small set of questions. Which services must remain within tolerance? Which dependencies could push them outside tolerance? What has been tested under realistic constraints? What has been learned recently that altered architecture, process, or operating cadence?
This posture also benefits technology and security leaders. It reframes conversations away from defending risk registers toward demonstrating capability. It creates productive tension between aspiration and execution, and it prevents resilience from collapsing into reassuring language divorced from operational reality.
The role of cyber expertise at the board level follows the same logic. The objective is not to transform directors into operators. It is to equip them to ask for evidence, to understand tradeoffs, and to recognize when confidence is being substituted for proof.
Reliability, not maturity, is the ultimate measure of trustworthiness. It is a property of how enterprises are built, governed, and adapted when conditions deteriorate. Frameworks and assessments remain useful tools. But resilience emerges only when architectural reality aligns with the behaviors leaders expect to see under stress.
Oritse J. Uku is a CISO and resilience architect with two decades across military intelligence, global financial services, and enterprise cybersecurity. He authored the Resilience Operating Model (ROM), a framework for building institutions that perform under stress, govern risk at scale, and demonstrate regulatory readiness.
For interviews, briefings, or commentary on cybersecurity, architecture, or operational resilience, please visit the Media & Speaking page.
Discover more from Oritse J. Uku
Subscribe to get the latest essays sent to your email.