Entry 0056·May 4, 2026·Reliability

Before You Build Another Line, Define a Stop

Plants don't see their throughput ceiling because three measurement defaults prop it up: misclassified availability, assumed quality, and overengineered specs
Truth · modeled scenario

A 20-minute micro-stop is not a micro-stop

A Midwest meat processor was running a hand-trim line short on rate. Eighteen operators on the floor. The ask going up to the capex committee was a second line. We put proxy sensors on the line for two days before the workshop.

The line wasn't slow. It was stopping for 20-plus minutes at a time, and the existing system was logging those stops as micro-stops. The sustained best run was around 20 parts per minute, which the line could hold cleanly. Availability was the killer, not the operators, not the conveyor speed, not the trimmer skill. Stop classification.

Once we redefined the thresholds (stopped at near-zero ppm, micro-stop 10 seconds to 5 minutes, unplanned downtime above 5 minutes), the picture flipped. The line had headroom. The crew didn't need to be 18. A field test pulling three positions ran clean. The savings wasn't a second line. It was a definition.

The throughput ceiling is built from three lies

Plants don't see their throughput ceiling. They see a number on the dashboard and a crew working hard. The ceiling is propped up by three measurement defaults that nobody updates because nobody pays the cost in a single budget cycle.

The first lie is misclassified availability. A line is built with stop categories that made sense at install. Maintenance changes, the product mix changes, planned breaks shift, and the categories don't move with them. A 22-minute jam gets averaged into a micro-stop bucket. The OEE dashboard says 78 percent availability and the floor knows it's closer to 50.

The second lie is assumed quality. Most OEE calculations carry a 90 to 95 percent placeholder for quality until somebody measures it on-site. Most plants never measure it on-site. So the throughput model gives back the number that was used to size it in the first place. A frozen meat producer I worked with in March was running consultant simulations modeling 26 people, then 16, then 10 on a trim line, against an 85 percent utilization target the floor was holding at 70. The throughput improvement the simulation produced was the gap between two assumptions.

The third lie is the overengineered spec. When the line jams once every 100 cases, the engineering instinct is to thicken the corrugated, add a wall, change the format. A natural-foods packaging brand we work with hit a 1-in-100 jam rate on a master-case transition. The proposal on the table was a corrugated resize. The right move was to run a larger-volume reproduction first, capture the failure mode, and root-cause it. A spec change masks the variable. It also locks in cost forever while the underlying flaw stays in the line.

What to check before the capex memo

Three diagnostics, in order, before any line-count or headcount conversation.

Define a stop. Pull the last 60 days of stop data and bucket it by actual duration, not by the labels the system was shipped with. Anything above five unplanned minutes is unplanned downtime, full stop. Anything below ten seconds is operational noise. The middle band is where you find the diagnostic. A line whose biggest micro-stops are averaging 20 minutes is telling you something the dashboard buried.

Pressure-test the quality number. If the OEE in your business case uses a placeholder for first-pass yield, the case is fiction. Hold the model open until you measure quality on the actual line, on the actual product mix, for at least one full shift. The frozen meat plant had the consultants running scenarios at 85 percent utilization while the floor was holding 70. The right model isn't what if we hit 85. The right model is why is the gap 15 points wide.

Run the volume before you respec. Intermittent failures need volume to debug. One jam in 100 cases means you need at least 500 cases of clean instrumentation to see what's really going on. The temptation to redesign the box is strong because a redesign feels decisive. It's also expensive, slow, and often a load-bearing fix for a problem you haven't characterized.

These three checks usually take a sensor pass and a workshop. Two days of plant time. The output is a defensible proxy OEE baseline, a corrected list of where availability actually goes, and a quantified gap between the throughput the line can hit and the throughput it does hit. With that in hand, the conversation about a second line gets cheap. Without it, the conversation costs whatever the line costs.

The committee asked the wrong question

When a plant hits its throughput ceiling, leadership reaches for the capex committee. The question lands as do we need another line. The honest answer, almost every time, is we need another two days of measurement first.

The capex case that was actually a stop-classification fix is a fix that pays back inside a quarter. The capex case that was actually a quality assumption is a fix that costs nothing and lifts the line by single-digit percent. The capex case that was actually a 1-in-100 corrugated jam is a fix that needs a vendor call and a longer test, not a new SKU on the procurement master.

Build the second line if the first one is genuinely full. Define a stop before you decide.

Continue reading in Reliability