Sana Voice Overview

Learning to speak new languages can be difficult. Sana Voice empowers learners to perfect their pronunciation and sound like a native with state-of-the-art speech recognition technology. Sana Voice effectively models pronunciation independent of your native language and provides instant personal feedback.

Sana Voice API can be used to score a word, sentence or phrase. The endpoint contains an overall score as well as scoring for each word at both phoneme and character level.

Scoring A Phrase

Scores a word, sentence or phrase. The endpoint contains an overall score as well as scoring for each word at both phoneme and character level.


This API is in beta mode. All endpoints and documentation are subject to change before initial release.

Body Parameters

dialectYesstringThe dialect to use for scoring. As of now, only en-us is supported.
user_idYesstringUser ID of the end user who the pronunciation feedback is provided for. This should be anonymized.
target_phraseYesstringA word, phrase or sentence to score.
audioYesBinaryA file with the user audio to be scored. For more information check Audio section
audio_formatNostringCan be wav, .mp3 or webm. Default value is wav. For more information check Audio section
target_phonemesNostringList of phonemes to score separated by |. Should be specified together with phonetic_system parameter.
phonetic_systemNostringPhonetic system of target_phonemes. For english language can be ipa or arpabet , check english phonetic system. This parameter should be specified together with target_phonemes parameter.

Response Format

On success, the HTTP status code in the response header is 200 OK and the response body is empty. On error, the header status code is an error code and the response body contains a list of Error Response objects.

overall_scoreintA value between 0–100. The score of the overall phrase.
word_scoresArray of Word Score objectsScorings for each different word including phonemes.

Example Curl Request

curl "" -H "Content-Type: multipart/form-data" -H "X-API-KEY: $API_KEY" -X "POST" -F dialect=en-us -F user_id=123456 -F target_phrase=good joke -F audio=@audio_file_16k.wav -F audio_format=wav

Example Target Phonemes

It is possible to specify the phonemes to score against in target_phonemes parameter. This is especially useful when the phonemicization of a word is dependant on the context. Example for a phrase I live here:

-F target_phonemes=ay l|ih|v hh|iy|r -F phonetic_system=arpabet


-F target_phonemes=aɪ l|ɪ|v h|i|ɹ -F phonetic_system=ipa

Example Python Code

import requests

url = ""
head = {'X-API-KEY': '$API_KEY'}

files = {'audio': ('ex.wav', open('audio.wav', 'rb')),
        'dialect':(None, 'en-us'),
        'user_id': (None, 123456),
        'audio_format': (None, 'wav'),
        'target_phrase': (None, 'My name is Sally')}

r =, headers=head, files = files)

Example Response

  "target_phrase": "good joke",
  "overall_score": 92,
  "word_scores": [
      "score": 87,
      "phoneme_scores": [
          "sounds_like": "W",
          "phoneme_ipa": "ɡ",
          "sounds_like_ipa": "w",
          "phoneme": "G",
          "score": 70
          "sounds_like": "UH",
          "phoneme_ipa": "ʊ",
          "sounds_like_ipa": "ʊ",
          "phoneme": "UH",
          "score": 100
          "sounds_like": "D",
          "phoneme_ipa": "d",
          "sounds_like_ipa": "d",
          "phoneme": "D",
          "score": 100
      "word": "good"
      "score": 100,
      "phoneme_scores": [
          "sounds_like": "JH",
          "phoneme_ipa": "dʒ",
          "sounds_like_ipa": "dʒ",
          "phoneme": "JH",
          "score": 100
          "sounds_like": "OW",
          "phoneme_ipa": "oʊ",
          "sounds_like_ipa": "oʊ",
          "phoneme": "OW",
          "score": 100
          "sounds_like": "K",
          "phoneme_ipa": "k",
          "sounds_like_ipa": "k",
          "phoneme": "K",
          "score": 100
      "word": "joke"

Score Schema

[90–100]Excellent. Native-like
[80–90)Good and intelligible
[60–80)It sounds okay, but there is room for improvement
[0–60)Doesn’t sound great. Should be tried again.

Object Model

This section describes the objects that are used throughout the different endpoints.

Word Score

wordstringDenotes the meaning or category of the tag.
scoreintA value between 0–100. Depicts how well the learner pronounced this specific word.
phoneme_scoresAn Array of Phoneme Score ObjectsA score of each Phoneme in a word.

Phoneme Score

phonemestringThe distinct unit of sound within the target word in ARPABET 2-letter format.
phoneme_ipastringThe distinct unit of sound within the target word in IPA format.
sounds_likestringThe distinct unit of sound within the inferred word in ARPABET 2-letter format.
sounds_like_ipastringThe distinct unit of sound within the inferred word in IPA format.
scoreintA value between 0–100.

API Semantics

This section explains the semantics of our Rest API. It includes common information that is valid for all the endpoints.

API Endpoints

The base URL for all our endpoints is Please note that non-secure access to the API is not available. All HTTP requests will be redirected to HTTPS automatically.


A valid API key is needed to access the Sana Voice API. Contact Sana Labs to get your own API key. Your API keys carry privileges for you to access the Sana Voice API, be sure to keep them secret. Do not share your API keys in publicly accessible places such as Github or client-side code.

The Sana Voice API expects the API key to be included in all API requests to the server in a header that looks like the following:


If the key is omitted or is wrong, you will get a 401 Unauthorized response to your request.

To authorize, pass the X-API-KEY header

curl -H "X-API-KEY: $API_KEY"

Make sure to replace $API_KEY with your API key.

Rate Limits

There is no hard rate limit at the moment where Sana will drop your data. However, if you need to make requests at a rate exceeding 200 req/s, please contact Sana Labs first.


All endpoints either result in success or an error. The API returns 200 or 201 for successful requests and relevant HTTP status code and an Error Response object in case of an error. See the Error Status Codes section for the HTTP Status Codes Sana Web API returns.


Sana Voice API supports wav, mp3 and webm audio formats. For the best quality and performance use a sample rate of 16k and 1 channel (mono).

Error Status Codes

The Sana Web API uses the following error codes:

Error CodeError TextError Description
400Bad RequestYour request is invalid.
401UnauthorizedNo API Key or your API key is wrong.
402Payment RequiredYour API Key expired.
404Not FoundThe specified resource could not be found.
405Method Not AllowedYou tried to access a resource with an invalid method.
429Too many requestsYou have exceeded your rate limit.
500Internal Server ErrorThere was a problem on the server side. Please try again later.
503Service UnavailableThe API is temporarily offline for maintenance. Please try again later.
    Sana Labs
  • Sana Labs
  • Nybrogatan 8
  • 114 34 Stockholm
  • Sweden