Speech Analysis Framework

This repository contains the Speech Analysis Framework, a collection of components and code from Google Cloud that you can use to transcribe audio, and create a data pipeline workflow to display analytics of the transcribed audio files.

It can and:

Process uploaded audio files to Cloud Storage.
Enrich the processed audio files with Cloud Speech-to-Text and Cloud Natural Language APIs.
Write the enriched data to BigQuery.
Redact sensitive information with Cloud Data Loss Prevention.

You can:

Gain insights into quality metrics to track such as call silence, call duration, agent speaking time, user speaking time, and sentence heat maps.
Build visualizations reports
Examples of what the visualizations you can build:

Speech Analysis Framework Limitations:

The framework can only identify two callers on a stereo or mono audio file. This is a limitation within the Framework code not Cloud Speech-to-Text API.
The framework can only process .wav or .flac files. This is a limitation within the Framework code not Cloud Speech-to-Text API.

The process follows:

An audio file is uploaded to Cloud Storage
The Cloud Function is triggered on object.create
The Cloud Function sends a long running job request to Cloud Speech-to-Text
The Cloud Function then sends the job ID from Cloud Speech-to-Text with additional metadata to Cloud Pub/Sub
The Cloud Dataflow job enriches the data, optionally redacts sensitive information and writes to BigQuery

To Learn More visit Visualize speech data with Speech Analysis Framework

If you want to do more with DLP audio redaction, check out the Speech Redaction Framework

Installing the Speech Analysis Framework

Install the Google Cloud SDK
Create a storage bucket for Dataflow Staging Files

gsutil mb gs://[BUCKET_NAME]/

Through the Google Cloud Console create a folder named tmp in the newly created bucket for the DataFlow staging files
Create a storage bucket for Uploaded Audio Files

gsutil mb gs://[BUCKET_NAME]/

Create a BigQuery Dataset

bq mk [YOUR_BIG_QUERY_DATABASE_NAME]

Create Cloud Pub/Sub Topic

gcloud pubsub topics create [YOUR_TOPIC_NAME]

Enable Cloud Dataflow API

gcloud services enable dataflow

Enable Cloud Speech-to-Text API

gcloud services enable speech

Enable Cloud Natural Language API

gcloud services enable language.googleapis.com

Enable DLP Optional

gcloud services enable dlp.googleapis.com

Deploy the Google Cloud Function

In the cloned repo, go to the “saf-longrun-job-func” directory and deploy the following Cloud Function.

gcloud functions deploy safLongRunJobFunc --region=us-central1 --stage-bucket=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME] --runtime=nodejs8 --trigger-event=google.storage.object.finalize --trigger-resource=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]

Deploy the Cloud Dataflow Pipeline

python3 --version Python 3.7.8
In the cloned repo, go to “saf-longrun-job-dataflow” directory and deploy the Cloud Dataflow Pipeline. Run the commands below to deploy the dataflow job.

# Apple/Linux
python3 -m venv env
source env/bin/activate
pip3 install apache-beam[gcp]
pip3 install dateparser

# Windows
python3 -m venv env
env\Scripts\activate
pip3 install apache-beam[gcp]
pip3 install dateparser

The Dataflow job will create the BigQuery Table you listed in the parameters.
Please wait as it might take a few minutes to complete.

python3 saflongrunjobdataflow.py --project=[YOUR_PROJECT_ID] --input_topic=projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME] --runner=DataflowRunner --temp_location=gs://[YOUR_DATAFLOW_STAGING_BUCKET]/tmp --output_bigquery=[DATASET NAME].[TABLE] --requirements_file="requirements.txt"

In the cloned repo, go to “sample-audio-files” to locate sample audio files to process by Speech Analysis Framework

For the [TOPIC_NAME], do not include the full path, just the name of the TOPIC
Choose true or false to run DLP. DLP will use all info types to scan the data.

# stereo wav audio sample
gsutil -h x-goog-meta-dlp:[true or false] -h x-goog-meta-callid:1234567 -h x-goog-meta-stereo:true -h x-goog-meta-pubsubtopicname:[TOPIC_NAME] -h x-goog-meta-year:2019 -h x-goog-meta-month:11 -h x-goog-meta-day:06 -h x-goog-meta-starttime:1116 cp [YOUR_FILE_NAME.wav] gs://[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]

# mono flac audio sample
gsutil -h x-goog-meta-dlp:[true or false] -h x-goog-meta-callid:1234567 -h x-goog-meta-stereo:false -h x-goog-meta-pubsubtopicname:[TOPIC_NAME] -h x-goog-meta-year:2019 -h x-goog-meta-month:11 -h x-goog-meta-day:06 -h x-goog-meta-starttime:1116 cp [YOUR_FILE_NAME.flac] gs://[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]

After a few minutes you will be able to see the data in BigQuery.

Sample select statements that can be executed in the BigQuery console.

-- Order Natural Language Entities for all records
SELECT
  *
FROM (
  SELECT
    entities.name,
    entities.type,
    COUNT(entities.name) AS count
  FROM
    `[YOUR_PROJECT_ID].[YOUR_DATASET].[YOUR_TABLE]`,
    UNNEST(entities) entities
  GROUP BY
    entities.name,
    entities.type
  ORDER BY
    count DESC )

-- List word, start time, end time, speaker tag and confidence for all records
SELECT
  ARRAY(
  SELECT
    AS STRUCT word,
    startSecs,
    endSecs,
    speakertag,
    confidence
  FROM
    UNNEST(words)) transcript
FROM
  `[YOUR_PROJECT_ID].[YOUR_DATASET].[YOUR_TABLE]`

-- Search Transcript with a regular expression
SELECT
  transcript,
  fileid,
  callid,
  year,
  month,
  day,
  sentimentscore,
  magnitude,
  date,
  silencesecs
FROM
  `[YOUR_PROJECT_ID].[YOUR_DATASET].[YOUR_TABLE]`
WHERE
  (REGEXP_CONTAINS(transcript, '(?i) [YOUR_WORD]' ))

This is not an officially supported Google product

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

GoogleCloudPlatform / dataflow-contact-center-speech-analysis

Programming Languages

Labels

Projects that are alternatives of or similar to dataflow-contact-center-speech-analysis

Speech Analysis Framework

If you want to do more with DLP audio redaction, check out the Speech Redaction Framework

Installing the Speech Analysis Framework