Smarter Conversations. Part 4 - Transcript

A bot that one of our teams is working on has the following functional requirement:

Dialog reaches a point where chatbot is no longer able to help. At this point, a transcript of the conversation will be sent to a mailbox.

Capturing a transcript requires that we keep track of all messages that are sent and received by the bot. The framework only keeps track of the conversations’ current dialogs stack. I already showed you guys how to build a simple history engine and give the bot the breadcrumbs of the entire conversation. Let’s see how we can record a transcript.

Option 1. Events (first attempt)

UniversalBot extends the node.js’s EventEmitter and will produce a number of events as it processes incoming and outgoing messages. We can subscribe to send and receive, for example:

1
2
3
4
5
6
7
8
9
10
11
bot.on('send', function(event) {
if (event.type === 'message') {
// ToDo: record in the transcript journal
}
});

bot.on('receive', function(event) {
if (event.type === 'message') {
// ToDo: record in the transcript journal
}
});

There’s a little caveat that I want to bring up before I show you how to get to the conversation’s session in the event handler.

send and receive are emitted before the bot runs through the middleware stack. In general, an exception in one of the middleware components should not break the chain, but if you want to only capture messages that were actually dispatched to the user, you would subscribe to outgoing that files after the middleware chain.

Let’s now add the journaling logic.

First attempt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
const transcript = function (session, direction, message) {
session.privateConversationData.transcript = session.privateConversationData.transcript || [];
session.privateConversationData.transcript.push({
direction,
message,
timestamp: new Date().toUTCString()
});

// NOTE 1: I will explain this line in details and show you
// that it doesn't actually do what you might think it does
session.save();
};

bot.on('incoming', function (message) {
if (message.type === 'message') {

// NOTE 2: loadSession() warrants an in depth explanation as well
bot.loadSession(message.address, (error, session) => {
transcript(session, 'incoming', message.text);
});
}
});

bot.on('outgoing', function (message) {
// ... (same as incoming, will refactor later)
});

NOTE 1. session.save()

It’s very important to understand how the bot handles the session data. The default mechanism is MemoryBotStorage that stores everything in memory and works synchronously. Your bot would default to it if you used the ConsoleConnector. You are a lot more likely to use the ChatConnector that comes with external persistence implementation. It will be reading and saving data asynchronously. Please also note that everything you put on session (e.g. session.userData) is JSON serialized for storage. Don’t try keeping callback functions around on the session.dialogData, for example.

The next very important thing to understand is that session.save() is asynchronous as well. It’s actually worse. It’s delayed via setTimeout(). The delay is configurable via autoBatchDelay and defaults to 250 milliseconds. The bot will auto-save all session data as part of sending the messages out to the user which it does in batches. The delay is built into the batching logic to ensure the bot doesn’t spend extra I/O cycles when it feels like sending multiple messages. Calling session.save() just triggers the next batch.

You can remove the delay:

1
2
3
4
const bot = new builder.UniversalBot(connector, {
persistConversationData: true,
autoBatchDelay: 0 // <-- the default is 250
});

The batching will still be asynchronous though. You can also bypass the batching altogether and instead of session.save() call session.options.onSave() directly, but you can’t work around the asynchronous nature of how the data is saved by the ChatConnector.

NOTE 2. bot.loadSession()

This method is not part of the documented public API and there’s probably a good reason for it. The bot framework doesn’t keep the sessions around. Session objects are created on demand and discarded by the GC when the request/response cycle is over. In order to create a new session object, the bot needs to load and deserialize the session data which as you just have learned happens asynchronously.

If you run the code I showed you, you will only see the outgoing messages on the transcript.

The incoming messages are swallowed and overwritten by the asynchronous and delayed processing.

Option 1. Events (second attempt)

There’s one event in the incoming message processing pipeline that is different from all others - routing. An event handler for routing is given a session object that the bot framework has just created to pass on to the selected dialog. We can transcript without having to load our own session instance:

1
2
3
bot.on('routing', function (session) {
transcript(session, 'incoming', session.message.text);
});

The routing event is the last in the chain of receive -> (middleware) -> incoming -> routing.

There is no equivalent to routing on the way out though. No event in send -> (middleware) -> outgoing chain is given the session object. There is a good reason why. Sending the messages out happens after the bot finished saving the session data.

While it’s sad that we don’t have an equivalent of routing in the outbound pipeline, knowing that session data is complete prior to bot framework dispatching the messages out makes me feel good about re-saving it. We don’t risk overwriting anything important like call stack or other session data.

Second attempt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
const transcript = function (session, direction, message) {
session.privateConversationData.transcript = session.privateConversationData.transcript || [];
session.privateConversationData.transcript.push({
direction,
message,
timestamp: new Date().toUTCString()
});

// no need to explicitely save() for the incoming
if (direction === 'outgoing') {
session.save();
}
};

bot.on('routing', function (session) {
transcript(session, 'incoming', session.message.text);
});

bot.on('outgoing', function (message) {
if (message.type === 'message') {
bot.loadSession(message.address, (error, session) => {
transcript(session, 'outgoing', message.text);
});
}
});

This time it works as expected but is not free of side effects. The bot.loadSession() on the way out is still asynchronous and prone to interleaving. If your bot starts sending multiple messages and especially doing so asynchronously in response to receiving external data via a Promise, for example, you may find yourself not capturing all of it.

Option 2. Middleware

Another way of intercepting incoming and outgoing messages is to inject a custom middleware. The middleware is called in between receive and incoming, and also in between send and outgoing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
bot.use({
send: function (message, next) {
if (message.type === 'message') {
// ToDo: record in the transcript journal
}
next(); // <-- I will explain in NOTE 3 below
},
receive: function (message, next) {
if (message.type === 'message') {
// ToDo: record in the transcript journal
}
next(); // <-- I will explain in NOTE 3 below
}
});

NOTE 3. next()

Middleware that you inject via bot.use() form a stack that is processed synchronously and in order. The bot framework does it via a recursive function that self-invokes. Every invocation notifies the next middleware in the chain and will eventually call the main processing callback. This is a nice way to keep running down the list even when one errors out as it will self-invoke in a catch block. I suggest that you guys take a closer look at UniversalBot.prototype.eventMiddleware if you’re interested. So if we don’t call next(), the chain will not continue and the bot will never receive the message.

We can use this feature to our advantage. If we chain next() onto the direct call to session.options.onSave(), we can ensure that the chain continues after the successful journaling of the transcript. No chance to have them all interleave and overwrite one another, though it probably takes longer before it gets to the user:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
const transcript = function (session, direction, message, next) {
session.privateConversationData.transcript = session.privateConversationData.transcript || [];
session.privateConversationData.transcript.push({
direction,
message,
timestamp: new Date().toUTCString()
});

session.options.onSave(next);
};

const journal = (direction) => (message, next) => {
if (message.type === 'message') {
bot.loadSession(message.address, (error, session) => {
transcript(session, direction, message.text, next);
});
} else {
next();
}
};

bot.use({
send: journal('outgoing'),
receive: journal('incoming')
});

You can also combine the two techniques and use routing event for incoming messages and only use send middleware to capture the outgoing traffic. Just make sure that you don’t do session.save() for the incoming. Here’s a gist.

Option 3. External Joural

I don’t know how stable is session.options.onSave() and bot.loadSession(). Neither one is part of the official public API so use at your own risk.

You can also roll your own transcript service and safely call it asynchronously from the send and receive event handlers. What I like about using session.privateConversationData is that I need no custom infrastructure and can easily discard the transcripts if I don’t use them. The bot framework will take care of it for me.

It would be nice though if bot framework gave me a routing-like event for the outbound pipeline that would fire before saving of the data. This way I would be able to nicely record the transcript without disrupting the flow of things, and wouldn’t risk relying on internal implementation detail that can easily go away in the next version.