Smarter Conversations. Part 2 - Open Dialogs

This post continues the smarter conversations series and today I would like to explore ways of keeping your dialogs open. Previously, in part 1, I showed how to add sentiment detection to your bot.

Waterfall

Prior to 3.5.3, the dialog routing system in the Bot Framework was not very flexible.

Imagine the following dialog:

User >> I’m looking for screws used for printer assembly
Bot >> Sure, I’m happy to help you. 
Bot >> Is the base material metal or plastic?
User >>I don't know. Does it matter?

The question that the bot asks about the material will likely be handled as a waterfall:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const builder = require('botbuilder');

bot.dialog('productLookup', [
function (session, args, next) {
// ...
builder.Prompts.choice(session, 'Is the base material metal or plastic?',
['metal', 'plastic'],
{ listStyle: builder.ListStyle.button });
},
function (session, args, next) {
const material = args.response.entity;
// ...
}
]);

The user’s response is neither metal nor plastic and the bot would simply reprompt:

Reprompt

The builder.Prompts.choice opens up a new dialog that gets pushed onto the stack and that’s what receives the next message. We will take a closer look in a minute.

Trigger Actions

The routing system was reworked in 3.5.3 and it came with a few important enhancements.

First, you no longer need the IntentDialog to recognize your users’ intents. The UniversalBot now inherits from Library and has its own set of global recognizers:

1
2
3
4
5
6
7
8
9
10
const bot = new builder.UniversalBot(connector);

// custom recognizers
const smiles = require('./app/recognizer/smiles');
const sentiment = require('./app/recognizer/sentiment');

// set up global recognizers
bot.recognizer(smiles);
bot.recognizer(sentiment);
bot.recognizer(new builder.LuisRecognizer(process.env.LUIS_ENDPOINT));

Second, the dialogs can now define trigger actions and be picked up even while another dialog’s prompt is being processed.

1
2
3
4
5
6
7
bot.dialog('affirmation', [
function (session, args, next) {
// ...
}
]).triggerAction({ // <-- this right here
matches: 'Affirmation'
});

If our bot had an intent recognizer that could understand that the user asked a question instead of answering the metal vs. plastic question, and if we had a way to handle it, we could break out of the waterfall using the triggerAction technique. In part 3 I will show you how a simple history engine can help you attach a sentiment like that to what was happening previously in the conversation and how your bot can intelligently handle such a diversion.

Routing and Callstack

Bot Framework maintains a callstack of the triggered dialog actions. When user utterance triggered the productLookup dialog, the stack only had one item coming into the first function of the waterfall:

1
*:productLookup

The builder.Prompts.choice adds another one:

1
*:productLookup
BotBuilder:Prompts <--

In the routing system that came before 3.5.3, the next message would land onto BotBuilder:Prompts, would upset the choice validation logic, and would trigger a reprompt. The newer version does a much better job.

First, the UniversalBot runs the incoming message through the set of global recognizers. Then the default routing mechanism runs three parallel searches - global actions, stack actions, and active dialogs. In doing so it collects all matching route results and scores them. The best route will then be selected and executed.

Handling Interruptions

The default behavior of launching a new dialog via its triggerAction is to clean up the callstack and start fresh. You can do two things to handle the interruption.

First, you can override the default behavior with onSelectAction. Instead of resetting the callstack you can add the newly triggered dialog on top of it. This would return the conversation back to where it was interrupted at when the newly triggered dialog finishes:

1
2
3
4
5
6
7
8
9
10
11
12
bot.dialog('affirmation', [
function (session, args, next) {
// ...
}
]).triggerAction({
matches: 'Affirmation',

// <-- overwrite how the dialog is launched
onSelectAction: function (session, args, next) {
session.beginDialog(args.action, args);
}
});

And you can also attach the onInterrupted handler to the dialog that could be interrupted and message the user about what’s happening.

Open All The Way

And if that was not flexible enough, you can define your own dialog’s behaviors by overwriting begin, replyReceived, and even recognize on your dialogs:

1
2
3
4
5
6
7
8
bot.dialog('custom', Object.assign(new builder.Dialog(), {
begin: (session) => {
session.send('I am built custom');
},
replyReceived: (session) => {
session.endDialog();
}
}));

I will sure come back to this technique when I show you how to drive your dialogs from metadata and not code. Comes very handy when building product recommendation bots. Stay tuned!

Smarter Conversations. Part 1 - Sentiment

This post starts a series of short articles on building smarter conversations with Microsoft Bot Framework. I will explore detecting sentiment (part 1), keeping the dialog open-ended (part 2), using a simple history engine to help the bot be context-aware (part 3), and recording a full transcript of a conversation to intelligently hand it off to a human operator.

Affirmation

Imagine the following dialog:

User >> I’m looking for screws used for printer assembly
Bot >> Sure, I’m happy to help you. 
Bot >> Is the base material metal or plastic?
User >> metal
Bot >> [lists a few recommendations]
Bot >> [mentions screws that can form their own threads]
User >> Great! I think that's what I need
Bot >> [recommends more information and an installation video]

It’s not hard to train an NLU service like LUIS to see a product lookup intent in the first sentence. A screw would be an extracted entity. Following a database lookup, the bot then clarifies an important attribute to narrow the search down to either plastic or metal screws and presents the results.

The highlighted sentence that follows is a positive affirmation. It is not an intent that needs to be fulfilled, not an answer to the question asked by the bot. And yet it presents an opportunity for a smarter bot to be more helpful, act as an advisor.

Sentiment

Text Analytics API is part of the Microsoft’s Cognitive Services offering. The /text/analytics/v2.0/sentiment endpoint makes it a single HTTP request to score a text fragment or a sentence on a scale from 0 (negative) to 1 (positive).

I decided to make the expressed sentiment look like an intent for my bot and so I built a custom recognizer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
const request = require('request-promise-native');

const url = process.env.SENTIMENT_ENDPOINT;
const apiKey = process.env.SENTIMENT_API_KEY;

module.exports = {
recognize: function (context, callback) {
request({
method: 'POST',
url: `${url}`,
headers: {
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': `${apiKey}`
},
body: {
"documents": [
{
"language": "en",
"id": "-",
"text": context.message.text
}
]
},
json: true
}).then((result) => {
if (result && result.documents) {
const positive = result.documents[0].score >= 0.5;

callback(null, {
intent: positive ? 'Affirmation' : 'Discouragement',
score: 0.11 // <-- just above the threshold
});
} else {
callback();
}
}).catch((reason) => {
console.log('Error detecting sentiment: %s', reason);
callback();
});
}
};

Context

Now I can attach a dialog that would be triggered when the bot detects an affirmation and no other intent scores higher. The default intent threshold is 0.1 and that’s why a detected sentiment is given 0.11.

1
2
3
4
5
6
7
8
9
10
11
const sentiment = require('./app/recognizer/sentiment');

bot.recognizer(sentiment);

bot.dialog('affirmation', [
function (session, args, next) {
// ...
}
]).triggerAction({
matches: 'Affirmation'
});

Unlike other intents, however, a detected sentiment is not enough to properly react to on its own.

The bot needs to understand the context to properly react to an affirmation or discouragement expressed by a user. The bot also needs to be able to handle an interrupted dialog if an affirmation (or an expression of frustration) came in the middle of a waterfall, for example.

I will get to it in part 2. Stay tuned.

Be a Human

I am not a Muslim. I am not a national of the seven countries banned from entering the United States. But I know the feeling of suddenly not being able to get back home.

It was 2009. By that time, I had lived in the US for almost two years on my L1 work visa. And so did my wife and my at-the-time six years old son. My daughter was born a US citizen just five months ago. A year before that we were among a few lucky winners of the diversity lottery and decided to do the consular processing. Instead of filing for adjustment of status while in the states, you basically go back to your home country and visit the embassy to get the immigration visa. You then reenter the US in the new status. The closest embassy that handled immigration cases for Belarus nationals was in Warsaw, Poland. We figured we would first go back to Belarus and do all the required paperwork there, then stop for a few days in Warsaw to get the new visas, and then fly back to the states right from there.

It was March. I took a week off from work, my son took a week off from school, and off we went. Happy to see our parents and our families. Happy to be on the road together. Looking forward to a new chapter in our lives.

At the embassy, we handed our documents to the clerk collecting everybody’s paperwork before the appointment with the consul. The lady carefully looked over everything and said we had two problems.

First, there was a small problem. My wife was married before and so she had two previous last names. We only had proof of no criminal records in the home country in her last two names, not her maiden name.

Then, there was a real problem. Since we were both in IT, the lady said, we would need to wait for a special processing that could take about two months.

It took a moment to sink in. If we didn’t get our immigration visas that day, we couldn’t continue our journey to the states. Our L1/L2 were only one year long and had long expired. It was perfectly legal to remain in the states for as long as my L1 petition was valid, but once we crossed the border, we all needed a valid visa to reenter. I asked if I could get my L1 visa renewed instead, but was advised against it. Not to hurt my immigration case, I was told

I remember how I felt. Helpless. Empty. Like my life froze. My home was across the ocean. A little townhouse we were renting. Our cars. My job. My son’s first grade. One flight away and yet completely unreachable.

In all fairness, we wouldn’t need to go back to a war- or terror-torn country. I could even continue to work remotely. Our parents were alive and well and would be happy to accommodate us while we would look for our own temporary place. We traveled together as a family with our five months old daughter so we wouldn’t be separated either. The worst thing that could happen was my son’s school but even that we would have probably figured out. And yet I felt empty, helpless, upset, and very, very, very sad.

We waited for more than two hours before it was our turn to talk to the consul. I could not predict what would happen next.

The consul started the interview. I remember our small talk. She was smiling and was very polite and so were we. She said she was happy to see green card applicants who had their lives together and knew what and why they were doing. We smiled back and said “well, yes, US is our home now. We live our lives there.”

“You guys have two problems”, she said. We nodded.

“How old were you when you married for the first time?”, the consul asked my wife. “21”, Maryna replied.

“Alright”, she said. “You probably were too young to have any encounters with the law at the time, right?”. We smiled and confirmed the consul’s very valid assumption. “Not a problem then”, the consul smiled.

“The next problem, however, is more serious”, she said. “Yes, we know, we were told”, we replied.

At this time, I thought I knew what would follow. A very polite statement that she was very sorry but that we would need to wait for about two months to get the required clearance.

“You guys don’t build software for nuclear plants, do you? Don’t work on military systems?”. I don’t think I knew where she was going with this. “No, of course not. We build web apps. You know, hotel room bookings, health insurance quotes, stuff like that”.

“Alright”, the consul smiled. “I can’t give you your green card today, though”.

I probably said “I know”… “But I will be happy to give it to you tomorrow. Come back in the afternoon. Congratulations!”.

And just like that we were cleared. By a human who had the authority and was not afraid to use it. There are rules, and regulations, and policies, and executive orders. And then there are humans.

Be a human.

Understanding Date Ranges in Your Chatbot

When your chatbot performs tasks of a personal assistant like scheduling meetings or generating reports, you need to make sure it can understand dates and date ranges.

Step 1. Resolve

LUIS has a set of pre-built entities to recognize date and time (builtin.datetime). It will understand when your users say tomorrow, October 1st or next week, for example, and will convert that to a date or a duration. Couple examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// tomorrow
"resolution": { "date": "2016-11-20" }

// last quarter
"resolution": { "date": "XXXX-Q4" }

// last year
"resolution": { "date": "2015" }

// last two years
"resolution": { "duration": "P2Y" }

// last week
"resolution": { "date": "2016-W45" }

// past three weeks
"resolution": { "duration": "P3W" }

// this month
"resolution": { "date": "2016-11" }

// last ten months
"resolution": { "duration": "P10M" }

Unfortunately, the only quarter-based duration LUIS understands right now is last quarter. It doesn’t recognize this quarter, next quarter, or plurals like last three quarters.

As you can see, the resolutions are indicative, use different formats, and need to be parsed to get converted to dates and date ranges.

Step 2. Parse

When LUIS detects a datetime entity (e.g. tomorrow) it will send back the resolution along with the extracted entity itself (the word tomorrow in this case).

First, I try to understand what time span the user asked about:

1
2
const span = 
['day', 'week', 'month', 'quarter', 'year'].find(s => entity.match(s));

Then I parse the dates and durations with moment:

1
2
3
4
5
6
7
8
9
10
11
12
13
const moment = require('moment');

// date
const resolved = resolution.date.replace('XXXX', moment().year());
const date = moment(resolved, ['YYYY-MM-DD', 'YYYY-Q', 'YYYY-W', 'YYYY']);

// duration
const duration = moment.duration(resolution.duration);
const sign = ['last', 'past', 'previous'].some(p => entity.match(p)) ? -1 : +1;
const date = moment().add(sign * duration.as('hours'), 'hours');

// normalized result
return date.startOf(span || 'day');

Step 3. Understand

Now we have the date representing the beginning of the period the user asked about. If today was Friday 11/18, for example, and you asked for last three weeks, the date would be Sun, Oct 23 (weeks start on Sunday in US unless you use isoweek with moment).

One date is not enough though for utterances like:

1
please generate a service cost report for the last two weeks

Your report generation service/API is likely to require a date range.

LUIS can also understand numbers spelled as digits like 2 or 5 or spelled as words like two or five. A phrase like last two weeks will produce two entities:

1
2
3
4
5
6
7
8
9
10
11
12
13
"entities": [
{
"entity": "two",
"type": "builtin.number"
},
{
"entity": "last two weeks",
"type": "builtin.datetime.duration",
"resolution": {
"duration": "P2W"
}
}
]

Last thing I need to do to understand the range, is to extract the number and do the date math:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const moment = require('moment');
const builder = require('botbuilder');

const numbers = {
'one': 1,
'two': 2,
'three': 3,
// you got the idea
};

// the entity here is the 'builtin.number'
const range = builder.EntityRecognizer.parseNumber(entity)
|| numbers[entity]
|| 1;

const end = moment(date)
.add(range, span)
.subtract(1, 'day')
.endOf('day');

And that’s it. Now last three weeks is understood as 10/23 - 11/12. And last quarter will be 10/1 0:00 - 12/31 23:59.

Intent Recognizers For Your Chatbot

Two weeks ago I attended API Strat in Boston where I gave a talk on cognitive APIs and conversational interfaces and showed and explained an e-commerce chatbot that I built. My presentation is on slideshare. I have learned a lot about chatbots and now I feel an urge to write about it.

Skype conversation excerpt

Intents

My bot is using the intent dialog from the Microsoft Bot Framework:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const bot = new builder.UniversalBot(...);
const intents = new builder.IntentDialog(...);

intents.matches('Greeting', '/welcome');
intents.matches('ShowTopCategories', '/categories');
intents.matches('Explore', '/explore');
intents.matches('ShowProduct', '/showProduct');
intents.matches('AddToCart', '/addToCart');
intents.matches('ShowCart', '/showCart');
intents.matches('Checkout', '/checkout');
intents.matches('Reset', '/reset');
intents.matches('Smile', '/smileBack');
intents.onDefault('/confused'); // no intent recognized

bot.dialog('/', intents);

bot.dialog('/confused', [
function () {
session.endDialog('Sorry, I didnt understand you');
}
]);

The intent dialog associates a user’s intent like Explore or Checkout with a specific dialog that knows how to respond.

It feels very much like routing in a web framework where given a specific URL pattern, the request will be routed to a controller that knows how to handle it.

Users don’t spell out their intents like that though. And so the first thing my bot needs to do is to learn to recognize them. The simplest way to trigger a dialog handler in response to a users utterance is by matching it with a regex. A more sophisticated logic requires an intent recognizer.

Intent Recognizers

An intent recognizer is basically a service that can understand users’ utterances. Given a text message it will return a list of intents that it inferred from it along with supporting entities. Here’s how it looks in LUIS (language understanding service from Microsoft):

LUIS intents and entities

The Explore intent was recognized along with two supporting entities that I trained it for. Here’s another way of looking at it:

1
2
3
4
5
curl -v "https://api.projectoxford.ai/luis/v2.0/apps/{app-id}" 
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: {subscription-key}"
-G
-d "q=I am looking for touring bikes. Do you have some?"

And the response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"query": "I am looking for touring bikes. Do you have some?",
"topScoringIntent": {
"intent": "Explore",
"score": 0.9994699,
"actions": [
// ...
]
},
"entities": [
{
"entity": "touring",
"type": "Detail",
"score": 0.9710912,
// ...
},
{
"entity": "bikes",
"type": "Entity",
"score": 0.943606555,
// ...
}
],
// ...
}

Microsoft Bot Framework comes with built-in support for LUIS in the form of LuisRecognizer

Custom Recognizers

Not every thing your users say has to be sent to a natural language service to extract the intent. Buttons and tappable images can post back bot-specific commands like /show:123456789, for example, that you can easily recognize with a regex. Also, if you want your bot to smile back at a smile sent to it, you don’t need to train a linguistic model either.

It turns out, building your own recognizer is not hard at all. I have built a few for my e-commerce bot and here’s how it works.

First, know that the Bot Framework supports sending a message through a number of recognizers at the same time. You can chain them or run them all in parallel:

1
2
3
4
5
6
7
8
9
10
const intents = new builder.IntentDialog({
recognizers: [
commands,
greeting,
smiles,
new builder.LuisRecognizer(process.env.LUIS_ENDPOINT)
],
intentThreshold: 0.2,
recognizeOrder: builder.RecognizeOrder.series
});

The recognizer itself is a very simple interface with only one method - recognize. Here’s how you would detect a smile, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
module.exports = {
recognize: function(context, callback) {
const text = context.message.text;
const smiles = text.match(/<ss type="(\w+?)">(.+?)<\/ss>/);

if (smiles) {
callback.call(null, null, {
intent: 'Smile',
score: 1,
entities: [
// smiles[1] and smiles[2]
// have the details you need to smile back
]
});
} else {
callback.call(null, null, {
intent: null,
score: 0
});
}
}
};

And here’s another one that understands commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

const commands = {
parse: function (context, text) {
const parts = text.split(':');
const command = parts[0];

const action = this[command] || this[command.slice(1)];
if (!action) {
return unrecognized;
} else {
return action.call(this, context, ...parts.slice(1));
}
},
// ...
}

module.exports = {
recognize: function (context, callback) {
const text = context.message.text;

if (!text.startsWith('/')) {
callback.call(null, null, unrecognized);
} else {
callback.call(null, null, commands.parse(context, text));
}
}
};

That’s it for now but there is more to come. Stay tuned!