The client is a Singapore-based pioneer and leading provider of cloud-based phone solutions to businesses in the coming to light market. Its products and services are actively embraced by more than 15,000 customers across 65+ countries. The client wants to create a product that can modify the communication industry in today’s market with personalization, automation, AI, and analytics. Having notched their names as the leading cloud communication providers in the market in less than a decade’s time, they are recognized worldwide with a huge number of employees offering clients unparalleled reliability and intelligence in business telephony.
Pluto7 is a tech-enabled services company, building solutions for Artificial Intelligence, Machine Learning, and Data Analytics across retail, supply chain, manufacturing, hi-tech, and much more. Having partnered with Google, the global leaders in processing data, Pluto7 possesses expertise in smart analytics, Google-based technologies, AI, and much more. With our abilities to deliver the best solutions, our customers have placed their undeviating trust in our operational efficiencies.
Having established themselves in the telephony business, the client wanted to design a product for improvising the transcription accuracy of the audio files by anchoring the Google Cloud Speech to Text API.
This transcription can be used to upgrade the quality of the calls by the product users and enhance the user experience. The audio quality of the files was such that the client was not able to get satisfactory results using the Speech to Text API.
Pluto7 along with Google got into sync to solve this problem. It was observed that each of the test audio files had a substantial amount of noise which was impacting the overall performance of the Speech to Text API.
In addition to this, we also acknowledged that the client was not using any post-processing techniques. Keeping all these points in mind, the complete process was divided into two sections.
Pre-processing: The objective behind the preprocessing stage was to improve the quality of the audio files by reducing the background noise existing in the files. To achieve the best results it was essential to extract the audio files. These time and again extracted features can be divided into two categories: temporal and spectral. Pluto7’s team tried various techniques for reducing the noise from the signal by evaluating different methods.
Post-processing: With regard to improving the accuracy, speech adaptation was leveraged in the post-processing method. Synchronous recognition, asynchronous recognition, and streaming recognition are the three main methods performed for speech recognition.
The Speech Adaptation feature is primarily used to help Speech-to-text recognize specific words or phrases that are more frequently used to alert the recognition model.
Using the speech adaptation feature by utilizing the Speech-to-Text API, which accurately converts the speech into text and further helps to transcribe the content with accurate captions, our team was successful in reducing the word error rate. It helped to boost the text and recognition of words assisting in delivering better user experience through voice commands.