To Listen Without Consent – Abusing the HTML5 Speech

tl;dr;
I found a bug in Google Chrome that allows an attacker to listen on the user speech without any consent from the user and without any indication. Even blocking any access to the microphone under chrome://settings/content will not remedy this flaw.

Try the live demo… (Designed for Mac though it will work similarly on any other OS)

Watch the video…

*
The Sisyphus of computer science*

Speech recognition is like the Sisyphus of computer science. We came a long way but still haven’t reached the top of that hill. With all that crunching power and sophisticated algorithms, computers still can’t recognise some basic words and sentences, the kinds that the average human digest without breaking a sweat. This is still one of the areas that humans easily win over computers. Savor these wins, as it will not last much longer;)

One must appreciate Google for pushing this area forward and introducing speech recognition into the Chrome browser. The current level of speech support in Chrome allows us to create application and websites that are completely controlled form speech. It open vast possibilities – form general improved accessibility to email dictation and even games.

The current speech API is pretty decent. It works by sending the audio to Google’s servers and returns the recognised text. The fact that it sends the audio to Google has some benefits, but from applicative point of view it will always suffer from latency and will not work offline. I believe that the current speech support was introduced with Chrome 25. From Chrome 33 one can also use Speech Synthesis API. – Amazing!

But…
Before this fine API we currently have, Google experimented with an earlier version of the API. It works quite the same, the main difference is that the older API doesn’t work continuously and needs to start after every sentence. Still, it’s powerful enough and it has some flaws that enable it to be abused. I believe this API was introduced with Chrome 11 and I have a good reason to believe it was flawed since than.

*
More Technical Details*

Basically, this attack is abusing Chrome’s old speech API, the -x-webkit-speech feature.
What enable this attack are these 3 issues:

The speech element visibility can be altered to any size and opacity, and still stay fully functional.
The speech element can be made to take over all clicks on the page while staying completely invisible. (No need to mess with z-indexes)
The indication box (shows you that you’re being recorded) can be obfuscated and/or be rendered outside of the screen.

The POC is designed to work on Chrome for Mac, but, the same attack can be conducted to work on any Chrome on any OS.

This POC is using the full-screen mode to make it easier to hide the “indication box” outside of the screen.
It is not mandatory to use the HTML5 full-screen; it’s just easier for this demo.

As you can see in the demo and video there is absolutely no indication that anything is going-on. There are no other windows or tabs, and no some kind of hidden popup or pop-under.
The user will never know this website is eavesdropping.

In Chrome all one need in order to access the user’s speech is to use this line of HTML5 code:

That’s all; there will be no fancy confirmation screens. When the user clicks on that little grey microphone he will be recorded. The user will see the “indication box” telling him to “Speak now” but that can be pushed out of the screen and / or obfuscated.

That is enough in order to listen to the user speech without any consent and without giving him any indication. The other bugs just make it easier but are not mandatory.

(For the tree in the demo I have used a slightly altered version of the beautiful canvas tree from Kenneth Jorgensen)

— The bug was reported to Google.

Menu