Skip to content
All posts

How Data Labeling Shapes Aquaculture's AI Models

Blog-Data-Manolin-AI-Dark-Blue

The headlines tend to celebrate the Ferraris of AI: ChatGPT, Gemini, Claude, Copilot. Sleek, fast, and loud. But even the nicest Ferrari doesn’t go anywhere without fuel. In AI, that fuel is well-labeled, context-rich data.

Behind the scenes, a different set of companies have been doing the heavy lifting: data labeling and training operations that help models learn what they’re actually looking at.

They’ve been quiet.

Until now.

 

Why Labeling Matters Now More Than Ever

Meta’s $14B investment into Scale AI is a statement. It’s a bet that model quality depends on feeding systems the right data, not just more data. Scale’s rise mirrors a broader shift: the industry is recognizing that progress is gated by training data and the labels around it, not simply by compute.

They’re not alone. Surge AI has been reported to generate over $1B in annual revenue without raising venture capital. Different approaches, same point: as AI adoption accelerates, the bottleneck isn’t how fast a model runs, it’s whether the model understands the world with the right context.

 

Tagging vs. Labeling

Let’s level-set. For example.

  • Tagging adds metadata like where a photo was taken, when it was captured, or which camera produced it.

  • Labeling tells the system what the data is: “this is a salmon,” “this is a sea louse,” “this is sarcasm.”

Models only learn patterns if those patterns are anchored to correct, consistent labels. That sounds simple. At scale, it’s one of the hardest—and most expensive—parts of building AI.

 

Context Changes Meaning

Pardon our language for this section example.

Consider a language example straight from the trenches of moderation:

“Holy sh*t. This album is f***ing amazing!”

Flagged Under Review.

You then try:

“Holy sh*t! This album is amaaazing.”

Still flagged.

One more time:

“F*** yes. The OG bad b*tch is BACK.”

This time, there’s a suspension warning. The intent is positive fandom. The model doesn’t know that unless it has seen thousands (sometimes millions) of labeled examples that teach it how tone and context shift meaning. Without that human judgment in the loop, systems over-censor. With it, they learn nuance.

 

From Cameras to Cages

Translate that to aquaculture.

Training a vision model to spot sea lice on salmon requires more than video. It needs frame-level labels: which images contain lice, where they are on the fish, and what stage they’re in.

In the early days, companies hired offshore teams to draw boxes around objects - fish here, lice there. Scale AI’s initial model was simple: send us your data; we’ll handle the labeling. Today, AI makes first-pass guesses, humans correct, and those fixes feed the next training cycle. Faster pipeline, same principle: models learn what you teach them.

 

When Labels Drift, Predictions Drift

There’s a reason “garbage in, garbage out” persists. If female lice get labeled as mobile lice, downstream recommendations skew. “Wrong” isn’t always a dictionary error, it’s often a context error.

Good labeling doesn’t just avoid mistakes. It captures nuance. That nuance is what turns a generic detector into a reliable tool.

 

Stop Waiting for “Perfect” Data

A common refrain: “We’ll use AI once our data is perfect - new sensors, clean standards, tight integrations.” The problem is time. If you wait for perfect, you lose years of learning. And when you finally get there, the stack has changed again.

Most farms already have enough data to start. What’s missing is consistent labeling and context. That can look like:

  • Tagging treatments with exact water conditions.

  • Marking feed data with pigment levels.

  • Labeling mortality spikes with suspected causes.

At Manolin, we work with historical farm data. It’s messy. With the right labels and context, it becomes a goldmine for pattern detection and forecasting.

 

Make Learning Continuous

AI development isn’t a one-and-done pass. Every new example is a chance to refine understanding. The same feedback loop that improves a chatbot improves a lice detector:

  • Start labeling what you have now.

  • Keep adding context as new data arrives.

  • Don’t discard old data, re-label as your understanding improves.

Over time, models get better, faster, and more tuned to your reality.

 

The Bottom Line

The flashy part of AI is the model. The useful part is the data it’s trained on and the quality of its labels. That’s why companies like Scale AI and Surge AI have become central to the ecosystem.

In aquaculture, the farms that invest in context-rich, well-labeled data will lead.

Whether you’re building a system to detect lice or to forecast the right harvest window, the lesson is the same: models learn what you teach them, and the quality of your teaching depends entirely on the quality of your labels.