Pluto7 helped a leading Singapore based cloud-based phone solutions provider improve the quality of audio files


The client is a Singapore-based pioneer and leading provider of cloud-based phone solutions to businesses in the coming to light market. Its products and services are actively embraced by more than 15,000 customers across 65+ countries. The client wants to create a product that can modify the communication industry in today’s market with personalization, automation, AI, and analytics. Having notched their names as the leading cloud communication providers in the market in less than a decade’s time, they are recognized worldwide with a huge number of employees offering clients unparalleled reliability and intelligence in business telephony.

Why did they choose Pluto7?

Pluto7 is a tech-enabled services company, building solutions for Artificial Intelligence, Machine Learning, and Data Analytics across retail, supply chain, manufacturing, hi-tech, and much more. Having partnered with Google, the global leaders in processing data, Pluto7 possesses expertise in smart analytics, Google-based technologies, AI, and much more. With our abilities to deliver the best solutions, our customers have placed their undeviating trust in our operational efficiencies.


Having established themselves in the telephony business, the client wanted to design a product for improvising the transcription accuracy of the audio files by anchoring the Google Cloud Speech to Text API. 

This transcription can be used to upgrade the quality of the calls by the product users and enhance the user experience. The audio quality of the files was such that the client was not able to get satisfactory results using the Speech to Text API.

Pluto7 along with Google got into sync to solve this problem. It was observed that each of the test audio files had a substantial amount of noise which was impacting the overall performance of the Speech to Text API. 

In addition to this, we also acknowledged that the client was not using any post-processing techniques. Keeping all these points in mind, the complete process was divided into two sections.

  1. Pre-processing
  2. Post-processing

Pre-processing: The objective behind the preprocessing stage was to improve the quality of the audio files by reducing the background noise existing in the files. To achieve the best results it was essential to extract the audio files. These time and again extracted features can be divided into two categories: temporal and spectral. Pluto7’s team tried various techniques for reducing the noise from the signal by evaluating different methods. 

Post-processing: With regard to improving the accuracy, speech adaptation was leveraged in the post-processing method. Synchronous recognition, asynchronous recognition, and streaming recognition are the three main methods performed for speech recognition. 

The Speech Adaptation feature is primarily used to help Speech-to-text recognize specific words or phrases that are more frequently used to alert the recognition model. 


Using the speech adaptation feature by utilizing the Speech-to-Text API, which accurately converts the speech into text and further helps to transcribe the content with accurate captions,  our team was successful in reducing the word error rate. It helped to boost the text and recognition of words assisting in delivering better user experience through voice commands.



Industry High-tech


  • There was a lot of background noise in the audio files which were to be removed to improve the audio quality.
  • Formats of the audio files need to be changed to keep them in sync with the speech adaptation technique.
  • The languages used were multilingual, depending upon different customers of the client, so the recognition of a different set of words was done by the team.


  • The quality of the audio files was not up to the mark, which was considered to be rectified to further use to utilize the speech adaptation feature.
  • The rectifications made in the audio files made it feasible to recognize the words used in the audio files.
  • The word error rate is reduced by using the Speech-to-text API.

Products Used

  • Speech-to-text API
  • Google Cloud storage
  • AI platform

Customer Success Stories

Talk to an Expert

Transform your business by leveraging the power of Machine Learning Artificial Intelligence, Analytics, and IoT solutions.

Contact Us