Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. Singh et al.Building an Enterprise Chatbothttps://doi.org/10.1007/978-1-4842-5034-1_2

2. Identifying the Sources of Data

Abhishek Singh¹, Karthik Ramasubramanian¹ and Shrey Shivam²

(1)

New Delhi, Delhi, India

(2)

Donegal, Donegal, Ireland

Chatbots are one more channel of providing conversational flows to customers. In the previous chapter, we discussed how the banking and insurance industries are structured and what kinds of interactions happen with the customers in those industries. There are many types of touchpoints a bank or insurer provides to customers in the day-to-day operations, starting from selling a new policy to settling escalations of claims. All these touchpoints are sources of data for building an AI Assistant, i.e., chatbot. In this chapter, we will start by introducing chatbot types and sources of data for training chatbots and then we will introduce the General Data Protection Regulation (GDPR) in context of the chatbot for personal data.

Chatbot Conversations

The chatbot tries to mimic the conversation of a real human. In the context of interacting with a business, the conversations can be of a broad, generic subject or particular to the product or service. Based on the scope of conversation, we can divide the conversation into two types: general conversations and specific conversations. The type of conversation decides the scope of questions and knowledge the chatbot or human assistant needs in order to interact with a customer.

General Conversations

A general conversation is a typical conversation that happens when a customer and assistant are not confined to a specific topic or concern. The conversations can start from any point and can transverse to any direction based on the knowledge level of an assistant.

An example of such a conversation is

A user walks into the bank and wants to talk to the manager. Before the start of the conversation, we don’t know who this person is and why he is visiting the manager. The conversation can be about a sponsorship event, a loan account or utility payment, or something else.

To deal with such conversations, the chatbot needs to be built with many types of contexts and appropriate replies. The replies for such general conversations are also not heuristic in nature; they involve human natural intelligence and information/experience that is not available for the chatbot to build a conversation.

Many advances are happening in AI; we try to mimic complete human behavior by training with massive datasets and scenarios. However, we are still far away in terms of having complete general-conversation-based chatbots for industrial use cases.

Specific Conversations

Specific conversations are limited to some predesigned outcomes. These types of conversations have higher clarity of the scope of the talk and clear instructions to fall back on or cross-references to other sources. Any deviations from the set conversations are generally directed to predefined outcomes. All other cases are redirected to appropriate channels, or the conversations ends.

An example of such a conversation is

A customer walks into a store and goes to the refund desk. In this case, the refund desk has some specific conditions to process a refund and maybe some other particular functions. The customer cannot expect any other query than a refund to be offered at the refund desk. If he asks a question regarding discounts, he is directed by the refund desk to another counter.

Specific conversations are more predictable and can be handled with higher accuracy. The chatbots designed for specific tasks can communicate with information. The conversations are outcome-oriented and end once the outcome is achieved.

Training Chatbots for Conversations

Chatbots need to be taught how to have a conversation. The training of chatbots involves exposing chatbots to both rules and natural conversations. For general conversation chatbots, the amount of training data required is enormous, and so far we have not succeeded in creating an accurate general conversation chatbot. Alexa, Siri, and Google Home are few examples in this direction.

Creating chatbots also requires a set of rules documented or tacit to proceed with conversations. For example, if a chatbot asks for a customer’s name, it must expect a first name and a last name. If the last name is not captured, it must go back and confirm the name. This is important to make sure the conversation is specific to the correct customer.

To train a chatbot for conversations, we need to have a corpus of training data. The training data can be accessed from multiple sources based on the use case. In the following sections, we discuss some datasets for use in training chatbots.

Self-Generated Data

Chatbot developers need to start with some data to make the chatbot come alive. Usually, that data is generated by developers for some necessary flow themselves. This way they get some high-level flow defined by themselves so that they can keep developing the chatbot with assumptions.

In many cases, developers create multiple inputs and self-annotate them for training basic flows; being generated by developers for testing the flows, they are not the complete set for training. These inputs help developers get the chatbot ready for a beta release and collect data from real users. Self-generated data is only a way to start development; it’s not for general public use.

The data generated by developers is used to establish the data pipelines and system integration testing. Once the beta is deployed, the internal users can be exposed to the chatbot and more data is collected to keep training the natural language module.

Note

Do not confuse self-generated data with natural language generative (NLG) models. You will learn more about natural language generation from the small dataset in Chapter 5.

Customer Interactions

Customer interaction is the best source for training the chatbots. These conversations are the best to mimic for mainly two reasons:

Typical queries can be captured, and chatbot training can be prioritized for specific conversations.
The conversations capture real solutions provided in the past by experienced customer representatives.

Customer interactions happen through multiple channels, and these channels produce data for training chatbots as a new channel for customer interactions. Figure 2-1 shows the six main types of customer interaction channels for any modern business, applicable to our case of an insurance and bank as well.

../images/478492_1_En_2_Chapter/478492_1_En_2_Fig1_HTML.jpg — Figure 2-1
Customer interaction/service channels

Phone

Phone calls are attended by experienced call center representatives and mostly accessed when the customer requires an immediate resolution to their queries. In modern days, this mode is recommended as the last step since it is costly for companies to maintain.

From the call center, we can get call transcripts, call recordings, core issues, and their resolutions. Core issues identified during calls and their resolution can help our chatbot learn to identify issues and provide solutions.

Emails

Email conversations are usually detailed and have a chronology of events explained and a clear statement of what the customer wants. These emails can be a good source to capture issues that need more than one-dimensional data to solve them.

Customer email records can be accessed in plain text format, with original emails and the response trail to developed conversations.

Chat

Many financial institutions use online web chat with customer service representatives to make sure they can serve multiple clients at the same time and reduce dropout of incoming queries at the call center.

This data set is very close to what a chatbot needs to mimic a conversation. Past chatlogs can be accessed as plain text files.

Social Media

Social media become popular when social media companies allowed business accounts to be created on their platforms. The interaction of social media tends to be generic and difficult to track with the actual customer of the general population.

Some platforms allow business accounts to download their data while some allow extracting data from API endpoints.

Customer Self-Service

Some necessary troubleshooting processes are created as self-service portals for customers. They may be as trivial as changing the PIN or offering FAQs for more information. Successful self-service cases are good for creating processes to train the chatbot to help people who ignore or can’t use self-service.

This data is usually structured as a tree of conversations leading to the solution of specific problems.

Mobile

Mobile here is considered the interactions that happen via mobile apps and mobile browsing history by customers. The data captured from these mobile applications is captured as activity logs of customers.

Customer Service Experts

Customer service experts play a significant role in identifying typical customer queries and how they handle them in real situations. Their inputs are also helpful in creating default replies and designing fall-over options. The years of experience dealing with customers can be used to train as well as test the initial chatbot release.

Experts need to be part of the process of developing chatbots for quality assessment of the chatbots’ experience and accuracy.

Open Source Data

Open source data is instrumental when you want to create general conversation chatbots and want to include some general flavor for specific talk. There are plenty of data sources available for training chatbots in natural language conversations.

A few of the open data sources are listed below; you can have more datasets as per your need.

Yahoo Language Data, created from Yahoo Answers ( www.cs.cmu.edu/~ark/QA-data/ )
WikiQA corpus, created from bing queries that redirect to wiki pages with a solution ( http://research.microsoft.com/apps/mobile/download.aspx?p=4495da01-db8c-4041-a7f6-7984a4f6a905 )
Ubuntu Dialogue corpus, created from Ubuntu technical support ( www.kaggle.com/rtatman/ubuntu-dialogue-corpus )
Twitter data on Kaggle, created from customer support at Twitter ( www.kaggle.com/thoughtvector/customer-support-on-twitter )

Crowdsourcing

The most critical training data comes from REAL people interacting with your chatbot in real time. This not only helps in building the corpus for training but also help developers see darker zones where the chatbots fail.

In best practice cases, all chatbots released at beta version are exposed to real conversations with selected customers and internal employees. The data is collected, and NLP models are retrained for each real instance. Another outcome of crowdsourcing is laying down the guidelines and scope of the chatbot.

Customer service experts also use the crowdsourcing inputs to build response languages and intensity for different conversations.

If you are building a chatbot in a regional language, you need to rely on crowdsourcing of training data. Some companies can provide you access to people who will interact with your chatbot to build the training corpus.

Personal Data in Chatbots

When we try to emulate human-like conversations with chatbots, we allow the humans to reveal information about themselves to the chatbot machine. This information then becomes risk for unauthorized access and may violate privacy laws and terms. This concern is of the utmost importance when you deal with customer queries that connect them to internal databases and required customer-specific information to process requests.

The customer can reveal the personal data both intentionally and unintentionally:

Intentionally: To get an account balance, you need to provide an account number and PIN.
Unintentionally: To know the claim process, you may end up revealing your policy number.

In both cases, the data is being captured by the chatbot, and the chatbot engine tries to process that data. Even if the chatbot can’t process the data, it still creates a copy of a conversation that contains private and personal data of customers.

Another area where we expose personal data to our chatbots is at the time of training the chatbot. Internal data of customers might have personal, financial, and demographic information without the developer’s full knowledge. For example, an email conversation regarding a claim settlement will contain a lot more details than just the customer-agnostic settlement process.

In deployment and training, personal data is captured and is vulnerable for law infringement and hacking, but this data is important for developing custom-centric chatbots. If we do not capture the data, we will not be able to design a chatbot that can take actions and provide information from internal databases.

We require more information than a normal conversation to be able to develop a chatbot that can access customer data and provide real-time information, securely and privately. The personal information helps in developing

Authentication and access
Compliance to company policies
A customer information retrieval system
A third-party API retrieval system

There are other related services and databases that require personal information to allow access to customer information in the private data zone.

As we just explained, we need personal data and other private data from customers to make the 24x7 AI assistant function with relevant data. This requires us to be very sure of both the customer agreements and local/international data regulations. Complying with regulations becomes of the utmost importance for banks and insurance companies to build specific conversation chatbots.

This is a challenge for companies because it limits the companies from using well developed, third-party chatbot services like Alexa, Dialogflow, and Watson. These services require the data to be sent to their server and stored for chatbot conversations. The limitations have created a vacuum to be filled by frameworks that can develop state-of-art chatbots internal to the companies.

It is essential to get awareness about what data privacy regulations require of companies when dealing with customer data. The General Data Protection Regulation (GDPR) is the leading regulation from the EU region and it’s also relevant to other parts of the world. In the next section, we give a high-level overview of its requirements, which are essential to consider when developing a chatbot.

Introduction to the General Data Protection Regulation (GDPR)

The GDPR is the successor to the 1995 Data Protection Directive, which was a regulation, not a directive. While the directive was left to member states to be transposed into national laws by legislation, the GDPR regulation is immediately enforceable as law in all member states simultaneously. It is a regulation on data protection for European Union citizens. It also applies to the transfer of personal data outside of the EU area. The GDPR gives users control over their personal information and whether they want to share or keep their data private.

It was adopted by all EU states and came into force on May 25, 2018. The regulation enforces hefty fines against non-compliant organizations (fees up to 4% of annual revenues or 20M € , whichever is greater).

Data Protected Under the GDPR

The GDPR in its definition of data is very broad and covers a multiverse of data generated and captured by companies. As per the GDPR, the protected data includes

Necessary identity information (name and surname; date of birth; phone number; a home address; an email address; ID card number and Social Security number etс.); web data (location, IP address, cookie data); health and genetic data; biometric data (data that identifies a person); racial and ethnic origin; religious beliefs; political opinions.

This includes data that chatbots deal with in the course of conversations.

Data Protection Stakeholders

As per the regulation, any company that collects and processes EU citizens’ personal information or that stores personal data of EU residents must comply with the GDPR, regardless of whether the company is present in EU territory or not. This scope means that most global businesses need to be GDPR-compliant.

The regulation defines three stakeholders to the GDPR:

Data subject: A person whose data is being processed by a controller or processor.
Data controller: An individual or company that determines the purpose and conditions of collecting and processing personal data from users.
Data processor: An individual or company that processes personal data for data controllers.

The definition of stakeholders directly impacts how we design our chatbots and ensure the rights of our customers who interact with chatbots. For example, a customer interacting with a chatbot is a data subject, and the bank or insurer or company becomes the data controller. The CRM or database system authorized personal also becomes the data controller. If your chatbot uses Dialogflow for processing the data, then it becomes the data processor.

The details of the law can be read from the source here: https://eur-lex.europa.eu/eli/reg/2016/679/oj .

Customer Rights Under the GDPR

It is essential for the chatbot developer team and leadership to understand what rights are enshrined in the GDPR for the customers. The chatbot functionality must abide by them.

The rights under GDPR are stated below for your reference:

#	Right	Data Controller Responsibilities
1	Right to be Informed	Be transparent in how much you collect and process personal information and the purpose you intend to use it for. Inform your customer of their rights and how to carry them out.
2	Right of Access	Your customers have the right to access their data. You need to enable this either through the business process or technical process.
3	Right to Rectification	Your customer has the right to correct information that they believe is inaccurate.
4	Right to Erasure	You must provide your customer with the right to be forgotten, provided that your legitimate interest to hold such information does not override theirs.
5	Right to Restriction of Processing	Your customer has the right to request that you stop processing their data.
6	Right to Data Portability	You need to enable the machine and human readable export of your customers’ personal information.
7	Right to Object	Your customer has the right to object to you using their data.
8	Right Regarding Automated Decision Making	You customer has the right not to be subject to a decision based solely on automated processing, including profiling.

Chatbot Compliance to GDPR

In the above sections, we discussed that the chatbots are no longer the subject of business communication only; chatbot makers must consider them in a data controlling and processing manner. This requires the chatbots to face the strict scrutiny of the GDPR.

Some of the generic and minimum steps that the chatbots makers need to take to be ready for GDPR compliance are listed below. The list is not comprehensive; it is just an indicative list for internal assessment. Please consider a full audit of a chatbot before making it public for general use.

The chatbots, before starting a conversion, must clearly state what data will be collected in the conversation and must be able to access what data is being collected.
The chatbot user must be allowed to access, review, download, and erase the data collected by the chatbot.
The chatbot logs must be securely stored and made accessible to users. Also, you must have the explicit permission of the user before processing the log to train your chatbots.
A clearly stated privacy policy and contact information for a Data Officer for any concerns.
The option of talking to a real operator rather than a machine chatbot.

These items are an indicative list of the steps that the chatbot owner needs to take. A full audit may reflect more areas to make sure the chatbot is fully compliant.

Summary

In this chapter, we classified conversations into generic and specific areas. While developing a chatbot for general nature requires a multiverse of data, the specific conversation chatbots require only the corpus of data to have those conversations. We introduced different data sources captured from a developer’s understanding of the functionality, data generated from customer interactions across all channels, and also, we discussed the significance of open data. Crowdsourcing of data for generic chatbots was also discussed. The significance and challenges of personal data were discussed with examples, and their impact of the design of chatbot was also explained. The most crucial part of a chatbot’s implementation is the impact of regulations when chatbots deal with personal data. We introduced the General Data Protection Regulation (GDPR) which protects the data of EU citizens, not only within the EU but outside as well. A short checklist of customers’ rights was provided along with some standard steps to be taken for chatbots to be GDRP compliant. In the next chapter, we will discuss how to design the chatbot and create conversation flows for a 24x7 insurance assistant.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Identifying the Sources of Data

Create new playlist

Sign In

Sign Up