Google Amazon and Open AI

Google Transcript API

Google provides a lot of free services that we can simply use in our applications. To use these services, first you have to go to Google Console. Create your account and get an api key. This api key can be used to be request all services. Their documentation is pretty clean so you should not have any issue in sign in and getting api key process.
Unfortunately Google Transcript api is not available for free. To use this we have to add a payment method. We have to give ₹1000 initially. This fee will be refunded in one or two business days. After refund we will also get 300 dollars worth of free credits. Also 60 minutes worth of free trials of Transcript api.

Now I have a good and a bad news. Good news is that Google provides a free demo for Transcript api. You don't need to add payment method to test free demo on their page. Bad news is that this live demo supports only few selected languages and Punjabi is not one of them.

API Endpoint and Request Format

Endpoint

https://speech.googleapis.com/v1/speech:recognize?key=${apiKey}

Request Format

{
    "audio": {
        "content": "/* Your audio */"
    },
    "config": {
        "enableAutomaticPunctuation": true,
        "encoding": "LINEAR16",
        "languageCode": "pa-IN",
        "model": "default"
    }
}

Amazon Transcript API

THIS IS A COMPLETE SCAM. DO NOT USE THIS AT ANY COST. Amazon Transcript api is available in Amazon Web Services. So we need to make a AWS account (which is not free btw). After that we can have a "free" demo. Once again like Google, only selected languages are available for demo. For implementing transcription api costs fixed amount of price for each use. We do get 60 minutes of free api use with AWS account.

These are the supported languages for Amazon Transcript API. This api takes input in two ways. Streaming and batch input. Streaming means live input and output. Getting transcript in real time as we speak. This is obviously very useful for our project. On the other hand batch input takes a complete audio file and puts it in a queue and then generates transcript of entire file. But here's the catch. We can not just simply send audio file to api endpoint. No No No. The api endpoint only takes files from S3 buckets, which is basically the cloud storage of AWS. Documentation of S3 buckets. We have to first make an s3 bucket, then upload the audio file in that bucket. Then aquire the uri for that file and then give that uri to api. Api then puts our transcript job into a queue, which means we can not have concurrent results. A single job takes minutes to be done. Unfortunately Amazon only supports batch input for Punjabi language. S3 buckets are also paid. So now we have 2 different billing counters. And on top of all of this, the result generated by this api is neither fast not accurate.

Open AI Whisper

Again Open AI Whisper is a paid api. It is fairly simple to use. Much like Google Transcript API we first have to add a payment method. And then generate an api key.

These are the Supported Languages.

Additional Headers

Key	Value
Authorization	Bearer {api key}

API Endpoint and Request Format

Endpoint

https://api.openai.com/v1/audio/transcriptions

Request Format

{
    file: audio_file,
    model: "whisper-1",
    language: pa,
    prompt: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language,
    temperature: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
    response_format: "verbose_json",
    timestamp_granularities: ["word"]
}

Response Format

{
    "task": "transcribe",
    "language": "english",
    "duration": 8.470000267028809,
    "text": "The beach was a popular spot on a hot summer day. People were swimming in the ocean, building sandcastles, and playing beach volleyball.",
    "words": [
        {
        "word": "The",
        "start": 0.0,
        "end": 0.23999999463558197
        },
        ...
        {
        "word": "volleyball",
        "start": 7.400000095367432,
        "end": 7.900000095367432
        }
    ]
}

Here is a Sample Code implementing Open AI Whisper api. You can read the full Documentation Here.