Modern AI applications thrive on data, but what happens when the places most in need of technological innovation are also the hardest to reach?
At Gemmo AI, we faced this challenge head-on as part of the FEROX project, a Horizon Europe initiative to improve the working conditions of wild food harvesters. In remote forests across Finland, far from reliable networks and stable infrastructures, we built and validated a robust system for dataset collection that relied on drones. Here’s how we approached the problem of collecting high-quality data in rural and wild environments.
Data collection in rural areas isn’t just a technical exercise: it’s a necessity. From wild berry harvesting to sustainable forestry, many critical sectors still rely on manual labor and intuition. These environments often lack digital visibility, making it difficult to optimize operations, ensure worker safety, or assess resource availability.
In the case of FEROX, our mission was clear: collect data that could inform AI models capable of estimating berry yields, supporting safety monitoring, and coordinating autonomous operations in real time.
Many of the aerial data acquisitions in FEROX required manual piloting, especially in dense or obstacle-rich environments. The drones used were provided by project partners, each selected for specific operational needs and terrains.
We worked with a diverse set of drones, including DJI Mavic drones—such as the Mavic 2 Pro, Mini 2, and Mavic 3 Multispectral—well-known for their reliability, image quality, and suitability for forest navigation. These were frequently used in field campaigns to capture high-resolution RGB imagery.
In parallel, project partner Ingeniarius deployed its own custom-built drones—such as the ING Scout v2 and v3—which offered more flexibility for integrating custom sensors and supported ROS-based data flows for field experimentation.
However, drones weren't the only source of data. To increase diversity and realism in the dataset, we also collected thousands of images using GoPro Hero 11 action cameras, smartphones (including Samsung S22 Ultra and Google Pixel 7 Pro), and professional DSLR equipment. This allowed us to capture scenes from the perspective of human pickers and under varied lighting and positioning conditions.
This multi-source strategy ensured that the resulting dataset reflected not only aerial views but also ground-level and first-person perspectives, making it more robust for training AI models to operate in real-world harvesting conditions.
Working in dense forests presents a set of physical constraints that go far beyond the usual considerations of AI or robotics.
In the FEROX project, drones had to fly at low altitudes to capture images detailed enough for object detection and annotation. However, this brought them dangerously close to tree canopies, shrubs, and undergrowth, all of which posed a serious risk of collision. Autonomous navigation had to be limited or disabled in these conditions, with manual piloting becoming essential in complex zones.
Another significant challenge was the turbulence generated by drone propellers. Even relatively small drones like the DJI Mini or Scout v2 created enough downward airflow to move leaves, berries, or mushrooms, distorting their appearance or temporarily hiding them from view. This created false negatives in the image capture and made reliable annotation more difficult.
Moreover, motion blur became a persistent problem. To map large areas efficiently, drones needed to move quickly—but this often introduced blurred frames, especially in low-light conditions under dense canopy cover. Slower speeds or hovering modes were adopted in critical areas, but this reduced operational efficiency and increased battery consumption.
These environmental challenges required a shift in our operational strategy:
Low-altitude, semi-manual flight paths were defined to avoid obstacles.
Static hovering and staged capture were used in particularly dense environments
These adaptations were crucial to building a usable dataset in an environment where traditional data acquisition workflows would have failed.
Fieldwork in FEROX revealed that collecting quality data in forests requires more than advanced drones—it demands adaptability.
Flying low is risky. To get detailed images, drones had to fly close to trees and undergrowth, increasing collision risks and forcing semi-manual piloting in dense areas.
Propeller turbulence matters. Downdrafts often disturbed leaves and berries, distorting the scene and complicating annotations.
Speed sacrifices clarity. Fast flights caused motion blur, especially under canopies. Slower, staged captures improved image quality but reduced efficiency.
Multiple perspectives help. Supplementing drone imagery with GoPro, smartphone, and picker-level shots added depth and resilience to the dataset.
Rural dataset collection is not a solved problem but projects like FEROX demonstrate what’s possible when technical rigor meets field insight.
At Gemmo AI, we believe that AI must serve not only high-tech industries but also remote communities and traditional sectors. That starts with meeting data where it lives, even if it’s under a forest canopy.