A team of researchers has developed an eavesdropping attack for Android devices that can recognize the gender and identity of the caller to varying degrees and also distinguish private speech.
According to EarSpy, the side channel attack aims to explore new opportunities for eavesdropping by capturing motion sensor data readings caused by reverberations from earphones on mobile devices.
EarSpy is an academic effort by researchers from five American universities (Texas A&M University, New Jersey Institute of Technology, Temple University, University of Dayton, and Rutgers University).
While this type of attack was explored in smartphone speakers, ear speakers were considered too weak to generate enough vibration for eavesdropping risk to turn such a side-channel attack into a practical one.
However, modern smartphones use stronger stereo speakers compared to models from a few years ago, which produce much better sound quality and stronger vibrations.
Similarly, modern devices use more sensitive motion sensors and gyroscopes that pick up even the smallest resonances from speakers.
The proof of this progress is shown below, where the headphones of a 2016 OnePlus 3T barely register on the spectrogram, while the stereo earphones of a 2019 OnePlus 7T produce significantly more data.
Experiments and results
The researchers used a OnePlus 7T and OnePlus 9 device in their experiments, along with different sets of pre-recorded audio that were played only through the earphones of the two devices.
The team also used the third-party app ‘Physics Toolbox Sensor Suite’ to capture accelerometer data during a simulated call and then fed it to MATLAB for analysis and extracting features from the audio stream.
A machine learning (ML) algorithm was trained using readily available datasets to recognize speech content, caller identity and gender.
The test data varies depending on the dataset and the device, but it has generally produced promising results for eavesdropping over the ear speaker.
Caller gender identification on OnePlus 7T ranges between 77.7% and 98.7%, caller ID classification ranges between 63.0% and 91.2%, and speech recognition ranges between 51.8% and 56.4%.
“We evaluate the time and frequency domain features using classic ML algorithms, which show the highest 56.42% accuracy,” the researchers explain in their paper.
On the OnePlus 9 device, gender identification increased to 88.7%, speaker identification decreased to an average of 73.6%, while speech recognition ranged from 33.3% to 41.6%.
Using the speaker and the ‘Spearphone’ app, researchers developed while experimenting with a similar attack in 2020, caller gender and ID accuracy reached 99%, while speech recognition achieved an accuracy of 80% has reached
Constraints and solutions
One thing that can reduce the effectiveness of the EarSpy attack is the volume users choose for their earphones. A lower volume could prevent eavesdropping via this side channel attack and it is also more comfortable for the ear.
The arrangement of the hardware components of the device and the tightness of the assembly also affect the diffusion of the speaker reverberation.
Finally, user motion or vibrations introduced from the environment lower the accuracy of the derived speech data.
Android 13 introduced a restriction to collect sensor data without permission to sample data rates above 200 Hz. While this prevents speech recognition at the default sampling rate (400 Hz – 500 Hz), it only drops accuracy by about 10% when the attack is performed at 200 Hz.
The researchers suggest that phone manufacturers should ensure that the sound pressure remains stable during calls and place the motion sensors in a position where internal original vibrations do not affect them or at least have the minimum possible impact.