Aquaculture's Burden: Trying to Compare Complex Biological Data

Written by Manolin | Mar 18, 2026 10:52:28 PM

Modern salmon farms generate more data than ever. Feed records, lice counts, mortality events, environmental readings, treatment logs. The collection problem has largely been solved. The challenge now is making that data answer the questions that actually drive decisions.

The Data Exists. The Answers Don't.

At Manolin, we see a consistent pattern across farm and supplier organizations: teams that are data-rich but struggling to produce reliable comparisons. End-of-generation reviews, feed contract renewals, treatment assessments, these are moments where data should give clear answers, and often doesn't.

The reason is structural. It's a phenomenon statisticians call Simpson's Paradox, and it's more common in aquaculture than the industry recognizes.

What Simpson's Paradox Means for Aquaculture

Simpson's Paradox occurs when aggregate data points in a different direction than the underlying subgroup data. A classic example comes from a 1986 study in the British Medical Journal comparing two kidney stone treatments. Overall, the less invasive procedure appeared more effective. But when researchers separated patients by stone size, the open surgical procedure outperformed it in every subgroup. The aggregate conclusion was technically accurate and practically misleading.

Aquaculture faces the same structural problem. Consider two companies operating in the same region with similar biomass under management. Company A finishes the generation at 10% mortality. Company B finishes at 15%. On the surface, Company A appears to be running a better operation.

But if Company A's sites sit in lower disease-pressure zones with favorable water exchange, and Company B operates in more challenging environmental conditions, the comparison tells you almost nothing about operational quality. The aggregate is real. As a basis for decisions, it's unreliable.

This is the easy version of the problem. It gets considerably more complex inside a single farm.

Where Comparison Breaks Down in Practice

A farm manager wants to know whether switching feeds paid off. In their previous generation, V23, they ran one feed. This generation, V25, they made the switch to a new product. Feed conversion ratio is the natural metric — lower FCR means better efficiency, and if FCR improved generation over generation, the new feed looks like the right call.

But V25 also had a disease event mid-cycle. For several weeks, some cages were moved to a functional feed before returning to the main diet. Sea temperatures ran warmer than V23. Routine grading redistributed fish between cages across the generation.

The FCR in the end-of-generation report is a blended output of all of it. The new feed's actual contribution is somewhere inside that number, but it isn't separable from the other variables that shaped the result. "Did the new feed actually perform better in V25?" has direct operational ROI implications. The aggregate FCR can't answer it cleanly.

The same problem appears in treatment assessments.

A farm manager reviewing mechanical delousing performance has access to treatment logs, pre- and post-treatment lice counts, and mortality data. But those records are attached to cages that, by mid-generation, rarely contain the same population they did at stocking.

Grading has redistributed fish. The population that received a treatment and the population whose outcomes you're measuring have quietly drifted apart. The cage-level record doesn't flag that drift — it shows what happened at a location, not what happened to a coherent group of fish.

Why the Cage Can't Hold the History

The cage is the natural unit of measurement in salmon farming: where feed is dispensed, fish are counted, treatments are applied, and mortality is logged. It's also the unit almost every reporting system organizes around.

The problem is that the cage is a location, not a biological identity.

Many farms already recognize this. A common approach is to track fish groups — fish sourced from the same smolt facility, stocked together into a set of cages, managed as a coherent unit. Cages 1 through 4 hold one group; cages 5 and 6 another. This works well early in a generation.

But operational reality intervenes:

Grading redistributes fish across cages based on size rather than origin
Disease events require moving or splitting fish in ways that don't respect group boundaries
Biomass targets push consolidations that mix what was once separate

Even a split within the same fish group creates a record-keeping problem. The same biological history becomes attached to two locations, and subsequent events at those locations get attributed to whatever fish happen to be there — not to the original group.

At that point, the cage record loses the ability to tell you what happened to a specific population of fish: which environmental conditions they were exposed to, what interventions occurred and when, how disease pressure evolved across their particular history. Without that context traveling with the fish, comparison breaks down. You're comparing places, not populations.

What a Population-Level Foundation Changes

Manolin's CTO (John Costantino) demonstrated this directly when testing how data structure affects predictive accuracy. Using the same model architecture against three ways of organizing the same farm data — cage-level records, rough fish groupings, and fully traced population histories — cage-level data produced 84.3% accuracy on growth prediction. Population-level data produced 98.5%. It's not just a marginal improvement but reflects the difference between a foundation that preserves biological history and one that loses it every time fish move.

We previously wrote how structured data improves predictive capabilities. You can read more on that for later here.

The Structural Answer for Data

The industry is constantly discussing benchmarking, standardization, apples-to-apples comparisons. The conversation is familiar well known. What tends to stay out of reach is the structural answer underneath it: the comparison problem isn't primarily a data collection problem or a standards problem. It's a unit definition problem.

Until performance data is organized around traced populations (groups of fish whose full biological context travels with them through the generation) the most important end-of-generation questions will keep producing unreliable answers.

Population-level tracing is Manolin's algorithmic response to that problem. And if you want to learn more, our whitepaper goes considerably deeper into how it works and what it makes possible across farm and supplier intelligence.

[Read the Population Tracing whitepaper →]

View full post