Machine learning training data

Large and specific groups of consumers for Voice and Image

AI for your organization?

Are you already exploiting the possibilities that artificial intelligence offers your organization? To properly set up your machine learning system you need training data. By practicing with training data the system can begin to recognize the patterns and learn. But how do you get those data sets?

Our clients struggle with the following issues

  • General data sets are not specific enough for your application. You do have 500 hours of speech but not on the topic you are looking for.

  • You have the right customer calls from your call center, but there are privacy sensitive elements in these calls that makes it impossible to use as training data.

  • Collecting real life training data is expensive. For example, installers who must take photos of 1000 meter boxes.

  • You have training data in English and German, but not in other important languages.


138 participants instruct the board computer of their car

138 people from 5 dialect areas, who give instructions to the board computer of their car in a sound studio.

1500 hours of call center conversations

1500 hours of call centre conversations simulated. More than 1000 participants completed 20 scenarios.

750 Apple users record sentences

750 Apple users record 200 sentences each with an app on their phone.

What is CG Research’ role in the project?

Our panel consists of 25,000 Dutch people who are happy to participate in market research. They also want to do other types of “assignments” for a small fee. These panel members participated in 2019 for several large projects to collect training data for machine learning.

Our customers often use the Netherlands as a pilot country and then roll out the data collection in other countries. CG plays a coordinating role and shares its best practices with our international partners. We already collected training data in Brazil, Mexico, Spain and the UK. We can do so in other European countries with our partners for qualitative research and we are also active in China and India.

We collect training data to optimize:

  • Voice technology where users can interact with your device or software with their voice.

  • Image Recognition uses artificial intelligence technology to automatically identify objects, people, places and actions in images.

  • Sentiment analysis is the automated process of understanding an opinion on a certain topic from written or spoken language.

Data collection in the following countries


  • Belgium
  • Denmark
  • Germany
  • Estonia
  • Finland
  • France
  • Italy
  • Latvia
  • Lithuania
  • Netherlands
  • Norway
  • Austria
  • Poland
  • Portugal

  • Spain
  • United Kingdom
  • Sweden
  • Swiss


  • Australia
  • Brazil
  • Canada

  • China
  • Indonesia
  • Japan

  • Malesia
  • Mexico
  • Russia
  • Singapore
  • Thailand
  • United States

Let’s get in touch

CG Research is the ideal partner. You determine which part you want to outsource or do yourself.

Let us call you
Receive a quote in your inbox
Merik te Grotenhuis
Merik te GrotenhuisManaging Director