12. On Reactor Design

In the early years of nuclear power, the danger was immediate and unmistakable: reactivity.

We were still learning how neutrons behave in real machines, not just in equations. Control was coarse, feedbacks were not always trusted, and margins were sometimes thinner than we understood. Accidents followed that path. Sudden insertions, unexpected couplings, systems that moved faster than the operators—or the physics models—could follow.

The same pattern repeated with fast reactors. They offered elegance and efficiency, but also sharper kinetics and weaker inherent damping. The physics gave less warning. Small misjudgments did not stay small for long.

Then came the Three Mile Island accident. Not a reactivity accident, but something more insidious: loss of cooling, misinterpreted signals, systems working—but not as expected. It did not explode into violence. It unraveled. And that changed us.

For decades after, the industry’s center of gravity shifted.

Thermal-hydraulics, decay heat removal, operator interfaces, human factors, severe accident management—these became the dominant concerns. Reactivity did not disappear, but it became something we believed we understood, something bounded by design and procedure.

We trained ourselves to think in that direction. We built tools, organizations, and instincts around it.

Now, we are entering another period of invention.

New reactor types. New materials. New operating modes. Old ideas revisited with modern tools. The landscape is opening again, and with it comes a familiar risk: not that we will repeat the past exactly, but that we will forget why it unfolded the way it did.

Reactivity accidents did not happen because people were careless. They happened because the systems were new, the feedbacks subtle, and the confidence just slightly ahead of understanding.

That condition is not unique to the 1950s or 1960s. It is the natural state of any new field at the edge of its knowledge.

So the question is not whether we are smarter now. In many ways, we are.

The question is whether we are humble enough.

There are still people who remember what it felt like when the plant did something no one expected. When the instruments told a story that did not quite make sense. When the margin you thought you had turned out to belong to a different assumption.

They are not always the loudest voices. Their lessons are not always written in current standards or models. But they carry something we cannot easily recreate:
an intuition for how systems fail before we have named the failure.

As we build again, that may be the most valuable input we have.

***

A nuclear plant should be simple enough for a single person to understand.

Not because one person will ever run it alone, but because understanding does not scale well beyond that.

Once a system grows past what a single mind can hold, it begins to rely on fragments. One team understands the reactor physics. Another the turbine. Another the electrical systems. Each part is sound in isolation. The failure comes in the spaces between them—where assumptions meet, but do not quite align.

Complexity does not just add difficulty. It removes ownership.

If no one can hold the full picture, no one can truly see how a disturbance travels. A valve position here shifts a flow there, changes a temperature somewhere else, nudges reactivity, alters power, feeds back into the grid interface, and returns again through control logic. Each step is small. The chain is not.

On paper, this is managed with procedures, interfaces, and layers of verification. In reality, it becomes a system where everyone is right locally, and the plant is wrong globally.

A design that one person can understand has a different character.

The connections are visible. The feedback paths are short. Cause and effect remain close enough to be recognized without translation. When something moves, you know where to look next—not because a document tells you, but because the system itself is legible.

This does not mean small, or simplistic in function. It means disciplined.
Few principles. Clear flows. Minimal hidden couplings.

You still have redundancy. You still have defense in depth. But they are arranged so that their interactions can be followed without abstraction.

Because in the end, safety does not live in the number of systems.

It lives in whether someone can look at the plant—really look at it—and understand what it will do next.

***

There are over 70 years of experience in nuclear operations, and very little in this field is genuinely new.

Reactor physics has remained unchanged. Thermal hydraulics operates under the same principles as before. Materials behave as they always have. Engineers haven’t necessarily become smarter than in the past. The industry has explored most of the design possibilities—sometimes successfully, and at other times, through challenging lessons.

The only significant advancement is in analysis software.

Startups often talk about disruption, but nuclear technology is not software.

You are not the first to consider removing boron.
You are not the first to simplify systems.
You are not the first to rely on passive safety.
Fast reactors were in operation 50 years ago.
Pebble Bed Modular Reactors (PBMRs) have been reimagined every 20 years.

All of these concepts have been studied, built, and operated, and in some cases, they have failed in ways that only become apparent after many years.

This experience exists and is documented in reports, event analyses, and operating histories, as well as in the minds of those who have been involved.

Ignoring this experience does not render it irrelevant; it simply means you may repeat the same mistakes.

Startups should approach the nuclear field with humility. This isn’t to suggest that new ideas are unwelcome, but rather that they are often not as original as they appear. If experience is not considered, making meaningful progress can be difficult.

It’s essential to study what has already been attempted. Understand why certain approaches succeeded and others failed. Learn from edge cases as well as successes.

In the nuclear industry, the distinction between “novel” and “known problem” often lies in how far back you are willing to look.

***

There is a tendency in reactor design to address every edge case by adding another layer of software.

On paper, this seems reassuring: more automation, more interlocks, more logical pathways. However, in practice, this leads to a safety case that relies on systems so complex that no one can fully verify them. You can test various scenarios, review code, and run simulations, but you cannot exhaustively demonstrate that every path behaves correctly under all conditions.

A reactor is no different from any other engineering system. If you cannot convince yourself that it behaves correctly, you do not truly understand it. And if you do not understand it, you have no business declaring it safe.

One way to avoid this complexity is to keep the reactor within a manageable size and complexity range, allowing for a safety case that is understandable from end to end. This should not be achieved by relying on increasingly complex automation but rather by designing a system in which the required protection logic remains simple enough to be reasoned through, rather than just tested.

Smaller units can help. They result in fewer coupled effects, more predictable transients, and protection functions that can be made transparent instead of hidden beneath layers of code.

There is also a practical aspect that is often overlooked. Staying roughly within the ~1000 MW class or below keeps the procurement of reactor pressure vessels within existing industrial capabilities. Forgings, transport, and supply chains remain manageable. You are designing something that can be built repeatedly, not something that requires exceptional one-off solutions.

Ultimately, it’s the same principle as in any calculation or piece of code: if the system is so complex that you cannot be sure there isn’t an error lurking somewhere, adding more layers will not solve the issue. Reducing complexity will.

Physics first. Software second.

***

Small does not automatically imply safe. Remember that only ~2% of the core Cs-137 inventory was released from the damaged Fukushima Daiichi reactors. Even a small reactor contains enough radioactivity to cause comparable emissions if retention arrangements are inadequate.

Small size does not justify lower defense-in-depth criteria. But it may make it easier to reach them.

***

I never keep both of my breathing machines on the same supply.

Not because I expect the supply to fail, but because when it does, it can to take everything connected to it with it.

That instinct carries over.

The grid looks solid when it works. It is large, maintained, and usually stable. It invites you to connect more than you should. Once you do, it ties things together.

The problem is not loss of power. It is shared disturbance.

When the grid is under stress, it might not fail cleanly. Voltage shifts, frequency drifts, protections act. And those conditions are seen by every connected system at the same time.

If multiple emergency trains are connected, they all experience the same input. Similar equipment tends to respond the same way. Relays trip together. Drives disconnect together. Control systems lose stability in the same moment.

Nothing has to break. Independence is lost anyway.

So the question becomes how to use the grid without letting it kill your redundancy.

One simple way is to limit exposure: Like with my respirators, you could only allow one emergency train on the grid at a time.

Now the behavior changes. A disturbed grid can still affect that one train—it may trip or become unavailable—but it cannot do that to all trains simultaneously. The others remain on their own sources, isolated from the same disturbance.

Failure, if it comes, is contained.

This also keeps the role of the grid honest.

Connecting a train becomes a deliberate choice: you accept the benefit and the risk for that train, and only that train. The rest of the system stays independent. There is no quiet spread of reliance across all trains.

None of this replaces autonomy. Every safety function still has to work without the grid. You must be able to disconnect and continue without hesitation.

The grid is allowed to help.
It is not allowed to be needed.

Nuclear emergency systems deserve the same level of protection as my breathing machines.

Redundancy only protects you if the lines of support are not shared.

***

House load operation is often presented as a sign of resilience: the plant disconnects from the grid, keeps its own auxiliaries alive, and continues running in an islanded state. On paper, it looks like independence.

In reality, it is one of the most efficient ways to create common cause failure.

Because everything that normally sits comfortably separated becomes tightly coupled in time

The turbine no longer follows the grid—it defines it.
The generator is no longer a passive supplier—it becomes the only source.
Frequency control, voltage control, load balance, and reactor power all collapse into a single control problem, with no external inertia to absorb mistakes.

And that is where the commonality creeps in.

A small disturbance is no longer local.
A control instability in the turbine governor propagates directly into generator frequency.
That frequency drift feeds into pump speeds, protection thresholds, and control logic.
Voltage excursions affect motor torque, valve actuators, and instrumentation.

What used to be buffered by the grid now feeds back instantly into every electrically dependent system:

You have not lost redundancy by design.
You have synchronized it.

All trains still exist.
All cables are still separate.
All breakers are still in place.

But they now depend on the same fragile reference: a single, self-generated electrical island. If that island wobbles, everything wobbles together.

The failure mode is no longer “one train trips.”

It is “everything degrades just enough, at the same time”:

Protection systems see borderline conditions everywhere.
Motors slow slightly.
Flows drift.
Margins erode in parallel.
Nothing fails cleanly. And that is what makes it dangerous.

Common cause failure is rarely about identical hardware failing identically. It is about shared dependencies failing in ways that look independent—until they are not.

House load creates exactly that condition. It replaces a strong, external stabilizing system with an internal one that must control itself while being affected by its own imperfections.

A grid fault may initiate the event. But from that point on, the plant becomes both the victim and the source of its own disturbances.

That is why house load should not be seen as a safe steady state.

In fact, it should not even be attempted.

Because once you are there, you have already traded independence for synchronization—and with it, created the perfect environment for common cause failure.

***

In a PWR, the main coolant pump sits in an awkward place in the safety story.

It is essential during operation. It is assumed to disappear during accidents. And in between, it carries one of the more fragile boundaries in the entire primary system: the shaft seal.

A reactor coolant pump is a pressure boundary with a rotating hole through it. At ~150 bar and a few hundred degrees, that is not a forgiving interface. The seal is not a single barrier but a staged system—multiple seal faces, controlled leakoff, injection flows, cooling water, and pressure control. Under normal conditions it is stable, almost invisible.

During a loss of power, that stability is what disappears first.

The pump coasts down. Seal injection may be lost or degraded. Cooling flows falter. Pressure differentials shift in directions the seal was never meant to see for long. What was a carefully balanced hydraulic system becomes a passive leakage path.

Not a rupture. Something more insidious.

A controlled leak turns into an uncontrolled one. Tens of liters per minute is enough. It does not look dramatic, but it does not stop either. It bypasses the integrity of the primary boundary without ever “failing” it in the traditional sense.

And it arrives exactly when the plant is least prepared to manage it.

In blackout conditions, you are relying on what remains: accumulators, gravity, stored energy, and whatever independence you have managed to preserve. Seal leakage quietly converts a closed system into an open one. Inventory is lost. Pressurizer level drifts. Long-term cooling becomes a race against depletion rather than a question of heat removal.

The uncomfortable part is that this is not a random failure.

All reactor coolant pumps share the same seal concept. The same dependencies. The same vulnerabilities to loss of injection and cooling. When the initiating event is a station blackout, the conditions that degrade one seal degrade all of them.

This is how a local weakness becomes a common cause.

You still have multiple pumps. Multiple loops. Multiple trains. On paper, nothing has been lost.

In reality, they are all moving in the same direction.

Designs have responded—seal improvements, passive seal concepts, dedicated seal injection from independent sources, even the assumption that seals will fail and the plant must cope with it. But the underlying lesson remains uncomfortable:

MCP shaft seal is the weak link.

Contents

12. On Reactor Design