Google Speech to Text

This is a Labs feature! We couldn’t wait to get your feedback on it, so we are providing you early access, even though there is still room for improvement. Don't hesitate to let us know your thoughts about this feature in the NetX Ideas Portal.

NetX's integration with Google's Speech-to-Text API automatically converts video and audio data to text, generating a VTT file that is fully indexed for search capabilities. This text will be broken into segments of the video or audio file in seconds, the duration of which may be determined via a NetX configuration property. This VTT file will be a textual transcription, and in the case of video files may be used to include closed captioning with your NetX video content. 

NetX's speech-to-text feature relies on Google's API for text generation and accuracy. Results may vary and will depend on the audio quality and clarity of the speech in the video itself. For this reason, generating text or VTT files from certain content (such as music or lower quality audio files) may not fully transcribe the audio as expected, including punctuation such as sentence breaks. Shorter second durations will often transcribe with greater accuracy than longer second settings.

Setup requirements

System requirements

  • NetX version 8.12 and later
  • FFmpeg installed and configured on your desired NetX instance. FFmpeg is already installed for SaaS customers. If you are on-premise, you can access the Windows installation guide here. You can access the Linux installation guide here.

Google credentials

To use Google's Speech to Text feature, you must create an API key with your Google account. Because this key will be linked to a specific Google account, it is recommended that a company account be created and used rather than tying your NetX instance to a personal account.

  1. From the developer's console, create a new project. 
  2. In the Dashboard, under Getting Started, click Explore and enable APIs, then click Enable APIs and Services at the top of the page. Select  Cloud Speech-to-Text API  and enable it.

    Be sure to choose Cloud Speech-to-Text API, not Cloud Text-to-Speech API. Creating your credentials before enabling this service will result in an API key that will not work with the speech-to-text feature.

  3. Next, select the key icon along the lefthand sidebar. This will open your Credentials page; choose Create credentials --> API key.
  4. This will generate an API key. Simply use the Copy icon to automatically copy your key to your clipboard. This is the key you will use to link NetX with your Google account.


Once you have generated your Google service's API key and inputted the key into the corresponding NetX property, you are ready to set up the AutoTask criteria which will trigger Google's speech-to-text job. You may configure your AutoTask based on standard AutoTask criteria, but below you will find examples of simple tasks which will generate VTT files for either video or audio files upon every applicable import into NetX. 

VTT files are generated on import, but may not be available immediately even if your asset is fully uploaded. To determine the status of your file's speech-to-text extraction, look to the Jobs queue found in the Systems area of your instance. This will show whether or not the process has been completed, or give you an approximation of how much of the process is complete (in percentages) if it is not finished generating.


This AutoTask will generate VTT files for all audio assets that are imported into your NetX instance. Note the action value is set to import, while the fileFormatFamily is established as audio

<task id="speech" name="Speech To Text - Audio">
	  <matchCriteria type="and">
		<criteria type="action" value="import"/>
		<criteria type="attribute" name="fileFormatFamily" value="audio"/>
		<autoTaskJob className="com.netxposure.products.imageportal.autotask2.impl.GoogleSpeechToTextJob"/>


This AutoTask will generate VTT files for all video assets that are imported into your NetX instance. Note the action value is set to import, while the fileFormatFamily is established as video

<task id="speech" name="Speech To Text - Video">
	  <matchCriteria type="and">
		<criteria type="action" value="import"/>
		<criteria type="attribute" name="fileFormatFamily" value="video"/>
		<autoTaskJob className="com.netxposure.products.imageportal.autotask2.impl.GoogleSpeechToTextJob"/>

Close captioning with your VTT file

Generated VTT files will appear in the views tab of the asset's asset detail page. In the case of audio files, these will just act as downloadable transcriptions in the form of VTT files. In the case of videos, however, these files will be able to be implemented as closed captions for the video in question. As long as your VTT view is titled previewVTT, this process should be automatic; simply toggle the cc icon to view or hide subtitles from the video preview. 

Advanced settings

Property Description

This is where you will input your Google API key. This should be a string of random numbers and letters generated by Google.

Value options: API key

Requires restart? Yes


Determines whether or not your generated VTT file will be indexed for content searches.

Value options: true / false

Requires restart? Yes

This property will determine how many seconds at a time Google will gather speech data, which will also determine how much closed captioning text is presented at a time. The default value for this property is 15.

Value options: true / false

Requires restart? Yes

Was this article helpful?
1 out of 1 found this helpful