Prepare training data

NLP training data includes a large number of sample sentences representing customer statements collected from many different sources. Before building a training data set, it is necessary to understand and determine the goals and problems the business needs to solve to collect data suitable for the business model of the business.

There are many different options for building a training dataset:

  • Actual business data: Is sample data available in the information system or collected from different sources of interaction between customers and the business (For example hybrid chat, Chat segments, Email, Social Networks, Forums,...). These data are realistic and highly accurate about customers' wants and needs.

  • Industry experts: To ensure practicality and applicability, creating and training the bot will require the participation of personnel who have professional expertise or who have worked in the industry/field to which they relate.

  • Pre-built Dataset: Data sets built by experts in many different fields of EM&AI to help customers speed up the training process and reduce data preparation time.

Concept

Some notes when preparing training data

Sample sentence

  • Each sample sentence represents a specific customer intent.

  • Do not use sample sentences that have unclear meanings or are nearly identical to cause interference with the recognition system.

  • Sample sentence data needs to be processed before inputting into the system for NLP training (Removing special characters, emoji, foreign words,...).

  • A minimum of 10 sample sentences is needed to train the VA to recognize intent. The number of sample sentences for each intent should not differ too much to reduce noise in the training data.

Intent

  • Naming the intention requires choosing the most accurate meaning appropriate to the context of the sample sentence. Usually a combination of a verb and a noun.

    For example:

    • Sample sentence: “Does the company work on Saturdays?”

    • Intent: Ask for business hours

  • There are many different ways to express intentions, so it is necessary to prepare many different sample sentences for each intention.

    For example:

    • Intent: Ask for business hours

    • Sample sentence:

      • Does the company work on Saturdays?

      • Is the company working this Saturday?

      • Is it open on Saturdays?

      • Is the office open this Saturday?

      • Is this 7th company open?

Entity

  • The entity represents the noun in the sentence (the object or context for that action).

    For example: date, time, location, brand name, personal name, city,...

1) Sample sentence: “Is the company working this Saturday?”\

  • Entity: Saturday

  • Entity type: date

  1. Sample sentence: “Is the 7th Dien Bien Phu branch open?”

  • Entity: Dien Bien Phu branch

  • Entity type: branch name

Last updated