QCRI’s Arabic Text-to-Speech System Advances Regional Solutions

The release of first AI-based deep learning model is an initial step towards making technology accessible to users and developers around the MENA region

Entity: Qatar Computing Research Institute

There are around 422 million Arabic speakers worldwide who will need this technology

A pioneering real-time Arabic text-to-speech (TTS) system has been created by the Arabic Language Technologies team at Qatar Computing Research Institute (QCRI), part of Hamad Bin Khalifa University (HBKU).

The release of the team’s first AI-based model is an initial step in making this technology accessible to users and developers around the MENA region. The first release comes with two highly refined voices - a natural voice that is suitable for news and reading, called Hamza, and another that is more expressive, called Amina. The latter primarily targets the generation of speech for young audiences by reading stories and assisting in education.

Dr. Ahmed Abdelali, Senior Software Engineer in the Arabic Language Technologies department at QCRI, said: “Automatic speech generation is an important technology that impacts our lives on a daily basis. Such technology is not a luxury anymore, but rather a necessity as it provides a solution to many challenging problems.”

“From conventional applications like voice notifications and book readers to more sophisticated voice assistants, TTS technologies provide the core technology of converting text to natural voice signals. While TTS technologies are not old, recent breakthroughs in deep learning have allowed us to build more sophisticated models that are fast and produce voices indistinguishable from human speech. Generating a human-like voice that is clear and understandable was a dream that our researchers were eager to achieve, and we are very proud of their achievement.”

Since the start of the project, the team has acquired high-quality recordings for two speakers. The recorded text was pre-processed by QCRI’s in-house Farasa system, which is another AI system that takes care of core Arabic linguistic tasks such as vowelization. The training was done using QCRI’s very own high-performance cluster, including state-of-the-art GPU machines.

There are around 422 million Arabic speakers worldwide and eventually they will need such technologies in their native Arabic language, so QCRI’s research is very important. The release of the TTS system marks another milestone in the team’s effort in closing the Arabic language processing loop by enabling a stack of technologies for the language such as processing Arabic text, morphological analysis, vowel restoration (diacritization), machine translation, automatic speech recognition and now speech generation.

The suite of tools developed by the team reflects their commitment to advancing research in Arabic as well as supporting local and international communities with related expertise.

The system is accessible at: https://tts.qcri.org/.