The organization of the E.V.A. project

Home > blog > E.V.A. > The organization of the E.V.A. project
The organization of the E.V.A. project

I spent the last two weeks researching how to get around with the project, which libraries to use, when to avoid reinventing the wheel and when to try to do it.
Actually, I think that the whole project is a kind of reinventing the wheel, but my goal, as I have already indicated in the introductory post of the series, is to learn new things with python, have fun and get a final product, in Italian, that works offline for the part of hotword detection, STT and TTS.

The software - a.k.a. the lines of code I'm going to paste together

I will use open source libraries and I will rely heavily on the experiences of other users to decide which path to take.
Without going too far, the project will be based on:

Hotword detection

For the hotword detection I initially evaluated Snowboy, which does not seem to be maintained and will be shut down at the end of 2020, I then considered CMU Sphinx, but the addition of new terms to the vocabulary took me to much time with no success.

After some research, I came across Michael Phi's channel "The A.I. hacker", where the author recently kicked off a series of videos aimed at documenting the creation of a python voice assistant capable of running on a raspberry, totally offline and based on Pytorch.
I immediately became interested in the project and joined his community on discord, but I therefore decided to continue on my way.

Speech to text

The first software I took into consideration was CMU Sphinx via the Speech Recognition library, also given its ability to work as a hotword detector. There are many examples of how to use this tool, and there is also an Italian language model. Unfortunately, it did not run smoothly and this prompted me to discard it.

After discarding Google products, due to the fact that they need a connection, I decided to proceed with Mozilla Deepspeech, perhaps as a matter of sympathy - I like Mozilla.
In fact, I chose to go ahead with Deepspeech, I saw some videos showing its performance on raspberry 4, also recently was released the model for the Italian language, and so I'm very curious to try it.

Language processing

My initial plan was to hard-code static sentences and then work with ifs - very advanced stuff - and I didn't plan to include a NLP function;
it probably won't be part of the first release of the project, but surfing the web, I found this library, Spacy, which intrigued me, I immediately wanted to try it in a project.

Text to speech

For the TTS I found several performing solutions such as Mimic, MaryTTS, Espeak and others that I don't remember at the moment. Some were not available for the Italian language, others were a bit heavy to run; I decided to use Espeak, the voices are not so natural, but I like how robotic they sound, at least in this phase of the project.

The hardware

The main component of the project will be a raspberry pi 4 (2 or 4 gb), a mini USB speaker, and a ReSpeaker 4-Mic Array.
The only alternative to the ReSpeaker that I found is the Matrx voice. I opted for the first one because it seems easier to configure. I also found this article on Medium which shows some benchmarks and the ReSpeaker comes out well (but the article is very old and analyzes the Matrix Creator instead of the Matrix Voice, which came out later).

I don't know how I will package everything, there are currently two possible solutions:

For now I will work at the software, I will think about the case later.

And the skills? The roadmap? The gadgets?

I am writing a list of the most useful skills, then I will sort them by priority and define the milestones with an indicative roadmap, in order to make a commitment with myself - are you talking to me?

I love gadgets! I don't do anything with them and often I don't paste / note / use them, actually they end up in some drawer, but I like them.
When I started working on a progenitor of this project years ago, I ordered a nice packet of personalized badges that I gave to friends - now they will be jealously guarded in their safes - I'll think of something for gadgets, yes.
Thanks for coming, see you next post!

Elementary Voice Assistant. Developed in Python, it can be activated with voice commands in order to facilitate daily activities.

Digital analyst, #programming, #data and #analytics. Waiting for the A.I. to conquer the world.
Personal opinions and considerations.