Make A Chatbot That Speaks Like You

4 minute read

If you’re a Silicon Valley fan like me I am sure you’ve seen the episode where Gylfoyle made an chatbot and made it chat with Dinesh all day without the latter even realising it untill later that day.

Alright, let’s make you a chatbot, but before we jump into this, we’ll need:

  • Data? well, you gotta feed your AI, it doesn’t work for free
  • Chatbot? we’ll need a conversational dialog engine

Data

To provide the bot with the data it needs we’ll need some data, or in this case your texts. The first idea that popped into my head is of course facebook, because (1) that’s where I can find all the text I’ll need, and (2) we can export the data in an easily exploitable format, in this case it’s JSON.

Make sure not to share your data or the trained model with anyone as it might contain sensitive information, you could however fix that with a De-identifier such as Google Cloud’s DLP.

Export messages from facebook

To export your messages from facebook if you haven’t already follow these steps:

  • go to Settings
  • click on Your Facebook Information
  • navigate to Download Your Information
  • select JSON in the Format select box
  • click on Deselect All and click Messages under the Your Information section
  • click on Create File

This could take a while, all you gotta do right now is wait for an email that says that your data is ready for download.

Preparing the data

You’ll get your data in a zip file which contains an inbox folder, this folder would contain all the conversations you have with all of your contacts organized on folders each folder is prefixed by the username of the contact.

Each conversation folder contains atleast a message_1.json along with other folders such as gifs, that’s of no interest to us in this case.

The message_1.json file has a very simple structure and it’s as follows:

{
  "participants": [
    {
      "name": "Simhi"
    },
    {
      "name": "Amine Hakkou"
    }
  ],
  "messages": [
    {
        "sender_name": "Simhi",
        "timestamp_ms": 1499810593070,
        "content": "Afeen!",
        "type": "Generic"
    }
  ],
  "title": "Simhi",
  "is_still_participant": true,
  "thread_type": "Regular",
  "thread_path": "inbox/Simhi_blahblah"
}

In this article I’ll simplify it a little as we’re not going to feed it all the data from all of the conversations, We’ll do it with one, however if you feel like feeding it all the data then you could tweak the code a little for that.

function readConversation(path) {
    return JSON.parse(require('fs').readFileSync(
        path.concat('/message_1.json'),
        { encoding: 'utf8' },
    ))
}

We’ll also need to clean the messages as it might contain non text payloads, such as gifs, videos..

function filterMessagesWithContent(messages = []) {
    return messages.filter(message => message.content)
}

If you skimmed over your messages you’ll notice that it’s sorted from new to old, we’ll need to fix that.

function reverseMessagesOrder(messages = []) {
    return messages.reverse()
}

Also there is this little thing, where you or the other party of the conversation send multiple consecutive texts, for the sake of simplicity we could group them in one message.

function groupConsecutiveMessages(messages = []) {
    const result = []
    let previousMessage = messages[0]

    for(let i = 1; i < messages.length; i++) {
        const message = messages[i]
        if(message.sender_name === previousMessage.sender_name) {
            previousMessage = { ...previousMessage, content: content.concat(" " + message.content) }
        } else {
            result.push(previousMessage)
            previousMessage = message
        }
    }
    // handle last element
    result.push(previousMessage)
    return result
}

We’ll need to cleanup the JSON, we’ll be losing information sure, we won’t need the rest anyway.

function messagesContent(messages = []) {
    return messages.map(m => m.content)
}

And finally write the result back to the fs.

function writeMessages(path, messages) {
    require('fs').writeFileSync(
        path,
        JSON.stringify(messages),
    )
}

Putting it all together would look something like this:

    const path = 'path-to-your-data/inbox/Simhi_blahblah'
    const conversation = readConversation(path)
    const messages = (
        messagesContent(
            groupConsecutiveMessages(
                reverseMessagesOrder(
                    filterMessagesWithContent(
                        conversation.messages
    )))))

     writeMessages(path.concat('/messages.json'), messages)

Now to turn this on we’ll have to do something like:

node transform.js

The Chatbot

Alright now that we got the data ready, we’ll need to train our AI, you can find a lot of conversational engines around, such as chatterbot and rasa.

We’ll go with chatterbot because it has a simpler and straightforward API.

Prequisites

You’ll need to have python3 installed on your machine.

Install

pip3 install chatterbot chatterbot-corpus

Code

the code below reads the json file we wrote using our transformer script, trains the model and answer’s to the messages it receives from the stdin.

there’s a lot you could improve on it, I kept it simple so I used chatterbot’s ListTrainer with no additional configuration.

import sys
import json
from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer

with open('messages.json', 'r') as f:
    conversation = json.load(f)

chatbot = ChatBot('Amine')

trainer = ListTrainer(chatbot)

trainer.train(conversation)

while True:
    message = input('>')
    response = chatbot.get_response(message)
    print('You:', message)
    print('Amine:', response)

Usage

Now all you gotta do is run the chatbot.

python3 chatbot.py

asciicast

Updated:

Leave a Comment