Welcome to Transcript api Documentation

As an intern at Khalis Foundation, me (Dilraj Singh) and my teammate Simarjot Singh, are assigned with the project of implementing the voice search feature in apps developed by said foundation. These apps include Sikhi To The Max and Sunder Gutka. Our task is to find the best transcript api that will give the most accurate results for Punjabi language.

This documentation will explain all the steps that we followed so it would be easier for fellow developers to find the best transcript api and easily implement in their own apps.

Web Kit Speech Recognition

The most obvious solution that comes to mind is Web Kit Speech Recognition. This is a web speech api. It is very easy to implement. We don't need an api key to use it. It is very fast. It generates accurate results most of the times. And it supports Punjabi language as well. The only problem is that this is browser based. It can easily be used in web applications but to use it in native environment, we have to find some workarounds. Also it is not supported by all browsers. It primarily works in Chrome and some Chromium-based browsers. Here is a very simple code that you can use to implement this feature.
You can read it's complete documentation here.

Bhashini AI

Bhashini AI is official LLM(Large Language Model) for indian languages by Indian Govt. It supports mostly all the indian languages and it provides several services like asr (Automated Speech Recognition), text-to-speech etc. You can see the complete list of services and their supported languages.

First you have to register yourself to Bhashini AI. Follow steps on this page to register and get api key and user id. Api key and user id both are required for api calls. See the Language codes here.

Pipeline Information

Now this step is important. Bhashini have divided their services into pipelines. They have two pipelines. For using a services you need pipeline id and endpoint.

Pipeline Name	Pipeline ID	Available Services
MeitY	64392f96daac500b55c543cd	[ASR] [NMT] [TTS] [ASR+NMT] [NMT+TTS] [ASR+NMT+TTS]
AI4Bharat	643930aa521a4b1ba0f4c41d	[NMT] [TTS] [NMT+TTS]

Additional Headers

Key	Value
userID	Your user id
ulcaApiKey	Your user id

Now you should have api key, user id and pipeline id. Now you need endpoint and request format. Refer the following

API Endpoint and Request Format

Endpoint

https://meity-auth.ulcacontrib.org/ulca/apis/v0/model/getModelsPipeline

Request Format

{
    "pipelineTasks": [
        {
            "taskType": "asr",
            "config": {
                "language": {
                    "sourceLanguage": "xx"
                }
            }
        },
        {
            "taskType": "translation",
            "config": {
                "language": {
                    "sourceLanguage": "xx",
                    "targetLanguage": "yy"
                }
            }
        },
        {
            "taskType": "tts",
            "config": {
                "language": {
                    "sourceLanguage": "yy"
                }
            }
        }
    ],
    "pipelineRequestConfig": {
        "pipelineId" : "xxxx8d51ae52cxxxxxxxx"
    }
}

Response from Endpoint

{
    "languages": [
        {
            "sourceLanguage": "gu",
            "targetLanguageList": [
                "bn"
            ]
        }
    ],
    "pipelineResponseConfig": [
        {
            "taskType": "asr",
            "config": [
                {
                    "serviceId": "ai4bharat/conformer-multilingual-indo_aryan-gpu--t4",
                    "modelId": "6411746056e9de23f65b5425",
                    "language": {
                        "sourceLanguage": "gu"
                    },
                    "domain": [
                        "general"
                    ]
                }
            ]
        },
        {
            "taskType": "translation",
            "config": [
                {
                    "serviceId": "ai4bharat/indictrans-fairseq-i2i-gpu--t4",
                    "modelId": "62023eeb3fc51c3fe32b8c5b",
                    "language": {
                        "sourceLanguage": "gu",
                        "targetLanguage": "bn"
                    }
                }
            ]
        },
        {
            "taskType": "tts",
            "config": [
                {
                    "serviceId": "ai4bharat/indic-tts-coqui-indo_aryan-gpu--t4",
                    "modelId": "636e60e586369150cb00432a",
                    "language": {
                        "sourceLanguage": "bn"
                    },
                    "supportedVoices": [
                        "male",
                        "female"
                    ]
                }
            ]
        }
    ],
    "pipelineInferenceAPIEndPoint": {
        "callbackUrl": "https://dhruva-api.bhashini.gov.in/services/inference/pipeline",
        "inferenceApiKey": {
            "name": "Authorization",
            "value": "m-LTAzxQVp6jjznmSR5RgKM"
        },
        "isMultilingualEnabled": true,
        "isSyncApi": true
    }
}

Now I know this is looking scary and there are a lot of things to keep track of. But let's understand it in simple way. We basically asked the pipeline "Can we use xyz service for xyz language?" We basically configured the pipeline for a specific task. If that task is not available in that language or in that specific order then you will recieve a "400 BAD REQUEST" error response. In normal cases pipeline responds with a very long response but we are interested in only 4 things in it.

auth parameter key: This value is obtained from name parameter under inferenceApiKey under pipelineInferenceAPIEndPoint.
auth parameter value: This value is obtained from value parameter under inferenceApiKey under pipelineInferenceAPIEndPoint.
service id: This value is obtained from serviceId parameter under config under pipeLineResponseConfig
callbackUrl: This value is obtained from callbackUrl parameter under pipelineInferenceAPIEndPoint

This callbackUrl is the new endpoint where we will send the final api call. For this we need the 3 other things mentioned in above list and one audio data. You need to convert the audio data into base 64 and then send this base64 data to endpoint.

Additional Headers

Key	Value
auth parameter key	key obtained from pipeline response
auth parameter value	value obtained from pipeline response

Final Endpoint Call

Endpoint

callbackUrl obtained from pipeline response

Request Format

{
    "pipelineTasks": [
        {
            "taskType": "asr",
            "config": {
                "language": {
                    "sourceLanguage": "xx"
                },
                "serviceId": "xxxxx--ssssss-d-ddd--dddd",
                "audioFormat": "wav",
                "samplingRate": 16000
            }
        }
    ],
    "inputData": {
        "input": [
            {
                "source": null
            }
        ],
        "audio": [
            {
                "audioContent": "{{generated_base64_content}}"
            }
        ]
    }
}

Finally after doing all this this api's response is neither fast nor accurate. You can check it's implementation in this repo.. Obviously this was very complicated. I tried my best to keep it as simple as possible. I left out some details that are not important for beginners.

If you hate your life and you decided to implement this api and you are getting any error that is not mentioned in my brief documentation then you can read the complete documentation on this website.