Please, change device orientation to portrait

by Oleksii Bilogurov

Talking to Programs: Serverless Architecture for Chatbots

Chatbots and voice assistants are being applied across different industries. Cloud-native AI services allow developers to build an entirely new generation of apps with a conversational interface for listening to, talking with, and understanding users. Let’s explore answers to some of the most common questions around serverless architecture for chatbots and what it means for businesses worldwide.

Cloud-native services provide infrastructure for rapid development programs assisting people with basic tasks, and leveraging NLP. Personal assistants can search and aggregate information for a user from different data sources, automate routine tasks, and make an action on behalf of the user. Speech-to-text services provide innovative opportunities to interact with bots triggering an action by voice command.

Several groups and patterns for a chatbot’s usage include:

  • Informational bots provide data based on a customer’s request, including the latest weather and news updates, game scores, user’s data based on the calendar, geo-location, social services, and email accounts.
  • Application bots are interfaces to mobile applications such as Siri, Google Assistant, Alexa, Cortana, and more. Usually, application bots provide voice interface for multiple interactive devices and automate routine actions, such as booking tickets, managing calendar and emails, and ordering food and taxi.
  • Enterprise productivity bots are streamline interactions between clients and assistants, helping to improve customer experience, automate processes, increase operational efficiency, and reduce costs for customer service. Assistants can check stock market trends, notify about changes, and automate or semi-automate processes.
  • Internet of Things (IoT) bots provide conversational interfaces for multiple interactive devices, including smart watches, smart speakers, smart home hubs, and more.

Amazon, Google, and other Cloud providers have their own services for building conversational interfaces using voice and text—Amazon Lex and Dialogflow from Google are prime examples. They have standard integrations and usage patterns. Despite many advantages provided by Cloud-native services, there are constraints for building flexible and custom solutions.

The Centre of Excellence (CoE) Solutions team had the following technical goals to build a flexible enterprise solution:

  • Develop and validate serverless architecture for chatbots based on the AWS Cloud platform
  • Custom integration with any messenger, not limited by predefined Cloud provider’s integrations
  • Integration with interactive web application leveraged by WebSocket
  • Secure authentication to corporate network without exposing user’s credentials

Big Picture

Conversation design services including Amazon Lex or Google Dialogflow provide integration with voice assistants and message clients, and natural language understanding (NLU) functionality to recognize the intent of the text or speech, and any data from an external API for conversation fulfillment.

Figure 1. A high-level integration workflow.

Conversational interface has advanced deep learning of automatic speech recognition and natural language functionalities. Multi-turn conversations are a key feature, managing the user’s session context and orchestrating the dialogue’s steps to get all needed parameters for fulfillment’s response. Integration with Lambda functions and serverless approach is another added benefit, but with all the advantages of fully-managed Cloud services with no upfront costs.

Amazon Lex supports chat services such as Facebook Messenger, Slack, and Twilio SMS by default. For integration of other chat messengers such as Telegram, MS Teams, and others not pre-defined by Amazon Lex service, should use a customized integration solution.

Chat Messenger—Custom Integration

Figure 2. High-level custom chat messenger integration with Telegram.

  1. User initiates request to chatbot via text messenger e.g. “Book hotel room for a week starting from September 1.”
  2. Request Lambda validates request and authenticates chatbot
  3. Request Lambda sends task to queue and provides response to client to prevent retry from client
  4. Response Lambda is triggered by task from queue
  5. Response Lambda sends request to Amazon Lex service
  6. Based on message context Lex bot processes text input in conversation and decides what business logic should be executed to provide response for user
  7. Fulfilment Lambda could trigger external services or API e.g. booking service API
  8. Response Lambda sends async response to chat client

Lex bot can keep conversational context and clarify additional information from the user’s side before triggers fulfilment Lambda and provides response to the user. For the example described above, the following scenario can apply:

  1. User asks: “Book hotel room for a week starting from September 1.”
  2. Lex bot understands the initial conversation but doesn’t have enough information to provide the expected response, and therefore asks another question to clarify: “In which city would you like to book a room?”
  3. User provides answer: “London”
  4. Lex bot has enough information to book hotel room for you

Web Application Integration

To integrate the chatbot functionality with an interactive web application, WebSocket APIs in Amazon API Gateway were selected. The WebSocket APIs allows the building of real-time chat applications in parallel with an event-driven and serverless Cloud-native approach.

Figure 3. Overview of a real-time chat application.

A web chat client has a connection to the WebSocket URL to send request from clients. WebSocket API in the API Gateway handles the connectivity between the server and client. When the client connects and sends message to the server, the Lambda function is triggered, containing information about the connection identifier. DynamoDB persists connection identifier to track each connected client. Chat client listens to the callback URL to retrieve a response from the Lambda function in real-time, based on a connection identifier.

The final solution for a web client integration re-uses a customized chat messenger integration architecture and additional infrastructure components, allowing the building of real-time web applications.

Figure 4. User authentication functionality.

Authentication

If a user wants a chatbot to perform tasks on behalf of themselves, but with third-party services including booking systems or corporate resources, then users should be authorized in the appropriate service. The following security best practices and credentials from third-party systems couldn’t be entered into a chat or a web form. As a result, the OAuth2 authorization flow was implemented.

Amazon Cognito was selected to control access to backend resources from the chatbot client application. Amazon Cognito provides native support of access control for the API Gateway and Lambda functions. Amazon Cognito also provides a solution to introduce sign-in through social identity providers including Google, Facebook, and through enterprise identity providers such as Microsoft Active Directory via SAML.

Figure 5. Detailed authorization flow.

Chat client makes login request to chatbot service. This service generates a secret key, stores it to DynamoDB, and then sends a response with generated login URL and timestamp for additional security. Then the user follows this URL, enters their username and password into Amazon Cognito web form. For the authorization flow, was chosen code grant type, the most secure and recommended for authorization public clients, because this allowed the users to exchange an authorization code for an access token. After successful credential validation by Amazon Cognito, it redirects the user with authorization code to chatbot service. Timestamp and secret are validated by chatbot service. This service makes a call to Amazon Cognito to change the authorization code to the correct credentials and saves it to the DynamoDB database. At the end of the authorization flow, the user is redirected to a success page, while the chat client receives a message confirming successful authorization.

Final Solution

Final serverless and Cloud native solution, based on AWS platform, contains the following functionality:

  • Integration with any chat messenger
  • Integration with web real-time application
  • OAuth2 authorization
  • Integration with third-party identity providers
  • Fully managed and horizontally scalable Cloud services

Figure 6. Final solution.

Summary

Through this research, SoftServe’s team concluded that technologies and services for chatbots and voice assistances have matured. Top Cloud providers have services, boost the development of chatbots including conversational interfaces that are as close as possible to a natural human conversation reducing friction and error rate between human and chatbot.

A flexible architecture was designed and validated. The current solution is ready for custom chat clients and third-party identity providers for integration and implementation of any business logic for fulfilments. The solution also covers quality attributes including scalability and high-availability, which are very important for the enterprise architecture.

To date chatbots and voice assistances are becoming ever so popular and helpful in different areas of our life. The next step for this project to make useful chatbot for all SoftServians. The starting point is automatization process for business trip booking and reporting.

If you have bright ideas, let’s build chatbots’ future together.

get started