Dan Steinman

Apr 3, 2019: Client-side Speech Recognition on the Web

As part of the Jaxcore project I have been working on a client-side speech recognition system for the Web, and I finally have it working well enough to start talking about it. At some point I will start a blog at http://jaxcore.com, but for now this website will have to do.

I have been wanting to build voice controlled JavaScript applications for as long as I can remember. And like everyone, I have been wholly unimpressed with the options available to me to do this. Unfortunately I do not have enough (any) machine learning or natural language expertise to roll my own, and it seems like Google's Web Speech API is the only viable way to add any sort of voice-control to a web page. But that comes with a lot of hairy problems like fees, rate limiting, round-trip delays from their systems, along with the persistent risk of Google pulling the rug out from under me at any time, and most importantly all the unpleasant privacy and security implications.

I have always felt that "my voice" should work like an input peripheral, not something I rent as an service from companies that I do not entirely trust. I am quite simply not willing to use Google's (or Amazon's or Apple's) cloud systems for speech or voice controls regardless of how well they might work, and I think there is a desperate need for an open alternative. I got frusterated enough by all this to begin doing something about it. So I've been working with Mozilla's terrific DeepSpeech speech-to-text project. Both Mozilla and Baidu deserve recognition for releasing this as an open source project, and I think the possiblities for an open voice control system based on this are nearly limitless.

I've connected DeepSpeech to Jaxcore in a similar way that I connect my Jaxcore Spin controllers, and the results are very encouraging. When motion and voice controls are connected to the web in this way it is possible to build a new generation of games and apps that are literally like something right out of a science-fiction movie.

This system allows anyone to run DeepSpeech as a desktop app local service and send the speech recognition results back to the browser through a browser extension. And the API is very, very simple:

import {Listen} from "jaxcore-client";

Listen.start(); // start voice recognition

Listen.stop(); // stop voice recognition

Listen.on("recognize", function(text) {
    console.log("you said", text);
});

No Google, no cloud, no API keys, no Amazon Lambda services. It'll work on any web page using a small client-side JavaScript library to hook up the communication line. The only drawback is you need to have both the Jaxcore desktop app, and the web browser extension installed. And DeepSpeech requires a large download (1.8GB) for the default English language model. But fortunately that model can be used across all websites.

To demonstrate how well this works I've been writing a chess game, and I'm going to the extra lengths of making this chess game as a very faithful replica of what was seen in the science-fiction masterpiece 2001: A Space Odyssey.

Here is the scene where the astronaut plays against HAL9000:

And here's Voice Chess, with voice control for all game functions, and Spin controller support:

Not bad, right?

This was not easy to set up, but now that it's working it's incredibly fun and satisfying to finally be able build things like this. Being able to program voice commands in JavaScript just as easily (even easier) than using mouse or keyboard events is simply terrific. It's probably the most fun I've had writing JavaScript in a really, really long time. And I want to help other people learn how to do this too. The entire Jaxcore software system will be open source, and I have parts of it already online at github, it's not quite done, and there's a lot more to come.

For those in the Toronto area, I will be demoing Voice Chess, the Jaxcore Spin controller prototypes, and many other things at Collision Conference in Toronto May 20 to May 23. I don't know which day I'll be demo'ing yet, but look for Jaxcore in the Alpha section.

Cheers,
Dan



Jan 24, 2019: Back Online

It has been over 15 years since I took down this ol' personal webpage of mine. Luckily, I took a complete snapshot of the FTP site at the time. I thought it would be fun to restore the site to its (almost) original state and indeed it was fun exploring an internet time capsule from the 90's. This should also fix a lot of very old inbound links that have been broken this whole time.

If you are curious what I'm up to these days, I'm still writing JavaScript, and also learning electronics, and building "IoT" products under the brand name Jaxcore. I'll be releasing a new web-enabled remote control called Jaxcore Spin, it's a JavaScript-programmable dial/wheel/knob for stereo systems, web games & applications, and pretty much anything you can think of.

You can reach me at dan@jaxcore.com
And you can follow my work on Jaxcore at http://twitter.com/jaxcore