cover-img

Matilda

Deep Learning powered Slackbot that allows teams to automate repetitive file tasks and save time.

17 February, 2022

4

4

0

Video Link: https://youtu.be/3ZJpvI-ccOY

Inspiration

Millions of files are shared every day on Slack. Each file has a story. When it comes to dealing with files in Slack there are so many problems we face daily

  • Do you have trouble finding files in your workspace?
  • Do you spend a lot of time reading documents?
  • Do you have trouble managing leads for your marketing campaign?
  • Do you check your document for grammatical errors before sharing on slack?
  • Do you work with multiple URLs and have trouble managing them?
  • Do you spend a lot of time finding contact information in resumes, excel sheets etc.?

To solve these problems (and beyond), we introduce Matilda : a Slack bot using Deep Learning that allows teams to automate repetitive file tasks and save time. Using Matilda you can: Search files easily.

  1. Summarise long documents into small digestible pieces.
  2. Generate audio summaries for long documents.
  3. Extract contact information (Name, Email Address, Address, Phone Number) from Images, PDF's, Excel documents and manage them in one place.
  4. Query contact information quickly by asking questions like, "What is Jane Doe's email?"
  5. Extract URL's from documents and manage them.
  6. Check documents for grammatical mistakes.
  7. Check documents for repetition.

What it does

Matilda is a set of 11 slash commands. All these commands complement each other and help enhance your digital workflow.

Managing leads & contacts

From managing leads in a marketing campaign to processing resume data for user onboarding, handling contact information is essential for all businesses. When there is a ton of unstructured data compiling contact data can become especially hard.

Research

We talked to a sales manager at an early stage startup that is about to launch its MVP for User Acceptance Testing(UAT). Tracy is responsible for preparing a list of users who will partake in the UAT and pay for the MVP. The data she collected for the leads comes in all formats pdfs, images, handwritten documents, excel files. Before she can begin contacting potential users she needs to compile all the data in an excel sheet.

Solution

Matilda automates the entire process for her.

  1. Tracy will upload all the documents in the #leads channel
  2. She will then run the /extract-contacts command for all the documents. PDFs, Images, Excel files
  3. Matilda extracts all the contacts information (Name, Email, Phone Number & Email Address) from each file and stores them in a CSV file.
  4. She can get the compiled data by simply running the /get-contacts command.
  5. Let's suppose she wants to look up someone's email address quickly. She can simply '/ask What is Jane Doe's email? ' and she will get the answer back instantly.

Lead-Management

/extract-contacts filename

This command extracts all the Personally Identifiable Information from the filename and saves the Name, Email, Phone Number & Email Address for each contact in a .csv file. Each channel has a different .csv file to ensure the separation of concerns. This command works well with handwritten documents, pdf files, images and excel sheets.

For example, An HR manager can run the /extract-contacts resume.pdf command for all the resumes in the #hiring channel to compile a list of candidates who are interviewing for the job.

Supported file types: .jpeg, .png,.csv, .doc, .docx,.html, .pdf, .pptx, .rtf, .txt, .xls, .xlsx

/get-contacts

This command returns a contacts.csv file with the Name, Email, Phone Number & Email Addresses of all the contacts. One file per channel

Use cases for the contacts.csv file:

  • Adding contacts to Salesforce
  • Syncing data with Lead Management Software Solutions
  • Starting a marketing campaign

/ask question

This no-code automation command allows non-technical users to access and filter data as easily and close to the human language as possible. This command uses state-of-the-art Natural Language Processing techniques to generate an SQL query from a natural language question. This command makes contact information accessible on the go.

For example,

Running the command,

/ask what is Jane Doe's email?

Will yield the response,

JaneDoe@gmail.com

Document Analysis

Writing is an essential skill & poor grammar can lead to ambiguity and misinterpretation. If you want to be a better communicator, you must learn to write well.

Research

For this one, we asked the legal team. Jane works as a Lawyer for a big SaaS corporation. He has to write lengthy contracts daily. He can't afford to let slip any grammatical mistakes as they may signal a lack of professionalism on his part. He has to keep his vocabulary up to date to ensure his message comes out the way he intends to. Currently, there is no way for him to automate this process.

Solution

With Matilda's /check-grammar & /check-for-repetition commands doing this is a piece of cake.

Grammatical-Errors

/check-grammar filename

This command proofreads the previously shared document and checks for grammatical errors and spelling mistakes. The state-of-the-art built-in NLP model not only points out all the wrong sentences in the text but also finds a suitable substitute by showing the corrected sentences.

Most-Used-Words

/check-for-repetition filename

Repetitive text can make the document boring and hard to read. This command checks the text in a shared document for repetition. It suggests viable alternatives for the words you use the most.

Supported file types: .jpeg, .png,.csv, .doc, .docx,.html, .pdf, .pptx, .rtf, .txt, .xls, .xlsx

Automatic URL extraction

Links are everywhere. That's it.

Research

For this one, we talked to IT Service Management Professionals. They are responsible for managing customer relations and helping customers out. For every reply they type out in response to a query, they have to find the relevant link and copy it. Managing links can be hard, especially when they come from multiple sources. One can lose track of tabs or copy the wrong link.

Solution

Matilda's /extract-urls command extracts all the hyperlinks and shows them right inside Slack.

Extract-URLS

Supported file types: .jpeg, .png,.csv, .doc, .docx,.html, .pdf, .pptx, .rtf, .txt, .xls, .xlsx

/extract-urls filename

This slash command scans the document for hyperlinks and shows them in Slack. Users can click on the 'Visit website' button to visit the website from their slack workspace.

Document Summarisation (Text & Audio)

Have you ever spent a lot of time reading a document only to realise halfway that you are reading the wrong document?. 400,000 available audiobooks and Slack's introduction of Huddles suggest that voice is the future. Documents haven't quite kept up with the audio revolution. Let us change that.

Research

This is a feature we wanted. A few problems we face daily:

  1. The small font in documents.
  2. Unsupported file types while viewing files on mobile.
  3. The inability to view files while travelling.

Solution

Matilda solves these problems with the /summarise & /audio-summary slash commands.

Summarise- File

/summarise filename

This command extracts all the text in the document and summarises them. Matilda converts long, boring & poorly formated documents into digestible pieces that you can read on the go. With this slash command in place, you don't have to open the document to find out what is inside.

/audio-summary filename

This command gets an audio summary for the previously summarised document. You can listen to the audio summary anytime you are unable to read it.

File backups for your business

Files are essential for any business. Currently, there is no way to backup files shared in your Slack Workspace to your S3 bucket.

Research

In this age of no code, the consensus for this feature was that it is a 'must have'.

Solution

Matilda helps users to search, upload, download files from their workspace without writing a single line of code.

Get_File

/get filename

This command gets the filename from your S3 bucket to your workspace

/search filename

This command helps users find any file in the S3 bucket. The users need to type in 3 letters for the autocomplete to kick in and suggest file names.

/delete filename

This command deletes the file from the S3 bucket.

How we built it

For extracting the text from documents we used AWS Textract. The extracted text was summarised using the summarization pipeline. For the audio summary, we used AWS Polly's neural engine. For the /ask command, we relied on the Tapas model to answer questions about tabular data. We used Comprehend to detect entities in our text that contain personally identifiable information (PII). RediSearch is used for searching documents and RedisJSON is used for caching summaries for documents that are frequently requested.

Challenges we ran into

Finding the right set of features to build was a challenge, we had to talk to multiple people before we came up with Matilda. Choosing the right set of libraries for link extraction, summarization & PII detection was also challenging (AWS SDK came to our rescue). It was challenging to incorporate so many different deep learning models and make them work in real-time. Making the video was challenging as we had to demo all the functionality within 3 minutes. Hope you like it.

Accomplishments that we're proud of

We are so happy to build something that solves multiple problems for different teams. Matilda brings advanced Natural Language Processing capabilities to your Slack workspace and saves time by streamlining repetitive file tasks. This project is built as an MVP based on our research. We believe there is a lot of work to do and Matilda is just getting started.

What we learned

Building multiple NLP pipelines requires a thorough understanding of different architectures and machine learning models. NLP has multiple use cases for business optimization and building Matilda has been a learning experience for us. Also, using Slack's new Bolt platform was fun.

What's next for Matilda

We believe Matilda is suitable for mainstream adoption by the Slack community due to its widespread relevance. These models require high-speed servers to run so we will be figuring out our finances as the first step for our Go-to-Market Strategy.

  • Getting the solution to be ready for distribution & getting the app approved on the Slack App Directory.
  • Improving support for S3 to allow more operations
  • Deploying the ML models on AWS SageMaker to get quicker replies.
  • Adding sentiment analysis for documents.
  • Adding the ability to redact documents.
  • Incorporating user feedback to improve the overall product.

slack

natural language processing

hackathon

4

4

0

slack

natural language processing

hackathon

Sarthak Arora
Builder | 3 X International Hackathon Winner

More Articles

Showwcase is a professional tech network with over 0 users from over 150 countries. We assist tech professionals in showcasing their unique skills through dedicated profiles and connect them with top global companies for career opportunities.

© Copyright 2024. Showcase Creators Inc. All rights reserved.