All Projects → aalireza → Simpleaudioindexer

aalireza / Simpleaudioindexer

Licence: apache-2.0
Searching for the occurrence seconds of words/phrases or arbitrary regex patterns within audio files

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Simpleaudioindexer

Lolcate Rs
Lolcate -- A comically fast way of indexing and querying your filesystem. Replaces locate / mlocate / updatedb. Written in Rust.
Stars: ✭ 191 (+91%)
Mutual labels:  search, search-engine, regex
Ripgrep
ripgrep recursively searches directories for a regex pattern while respecting your gitignore
Stars: ✭ 28,564 (+28464%)
Mutual labels:  command-line-tool, search, regex
Search Ui
🔍 A set of UI components to build a fully customized search!
Stars: ✭ 24 (-76%)
Mutual labels:  search, search-engine
Blast
Blast is a full text search and indexing server, written in Go, built on top of Bleve.
Stars: ✭ 934 (+834%)
Mutual labels:  search, search-engine
Opensse
Open Sketch Search Engine- 3D object retrieval based on sketch image as input
Stars: ✭ 883 (+783%)
Mutual labels:  search, search-engine
Riot
Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.
Stars: ✭ 6,025 (+5925%)
Mutual labels:  search, search-engine
Minisearch
Tiny and powerful JavaScript full-text search engine for browser and Node
Stars: ✭ 737 (+637%)
Mutual labels:  search, search-engine
Flexsearch
Next-Generation full text search library for Browser and Node.js
Stars: ✭ 8,108 (+8008%)
Mutual labels:  search, search-engine
Manticoresearch
Database for search
Stars: ✭ 610 (+510%)
Mutual labels:  search, search-engine
Fsq
A tool for querying the file system with a SQL-like language.
Stars: ✭ 60 (-40%)
Mutual labels:  search, regex
Rats Search
BitTorrent P2P multi-platform search engine for Desktop and Web servers with integrated torrent client.
Stars: ✭ 1,037 (+937%)
Mutual labels:  search, search-engine
Awesome Solr
A curated list of Awesome Apache Solr links and resources.
Stars: ✭ 69 (-31%)
Mutual labels:  search, search-engine
Search cop
Search engine like fulltext query support for ActiveRecord
Stars: ✭ 660 (+560%)
Mutual labels:  search, search-engine
Elasticsuite
Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
Stars: ✭ 647 (+547%)
Mutual labels:  search, search-engine
Ugrep
🔍NEW ugrep v3.1: ultra fast grep with interactive query UI and fuzzy search: search file systems, source code, text, binary files, archives (cpio/tar/pax/zip), compressed files (gz/Z/bz2/lzma/xz/lz4), documents and more. A faster, user-friendly and compatible grep replacement.
Stars: ✭ 626 (+526%)
Mutual labels:  search, regex
Better Search
Better Search WordPress plugin
Stars: ✭ 9 (-91%)
Mutual labels:  search, search-engine
Lieu
community search engine
Stars: ✭ 76 (-24%)
Mutual labels:  search, search-engine
Filemasta
A search application to explore, discover and share online files
Stars: ✭ 571 (+471%)
Mutual labels:  search, search-engine
Typesense
Fast, typo tolerant, fuzzy search engine for building delightful search experiences ⚡ 🔍 ✨ An Open Source alternative to Algolia and an Easier-to-Use alternative to ElasticSearch.
Stars: ✭ 8,644 (+8544%)
Mutual labels:  search, search-engine
Github Awesome Autocomplete
Add instant search capabilities to GitHub's search bar
Stars: ✭ 1,015 (+915%)
Mutual labels:  search, search-engine

WARNiNG: I no longer have the time to maintain this library. Contact me if you want to become the mantainer.

SimpleAudioIndexer

.. image:: http://rafiei.net/assets/sai/sai_logo.png :alt: Simple Audio Indexer: Index audio files and search for a word/phrase or match regex patterns :align: center

|build| |license| |docs| |python| |wheel|

  • Description <#description>__
  • What can it do? <#what-can-it-do>__
  • Documentation <#documentation>__
  • Requirements <#requirements>__
  • Installation <#installation>__
  • Uninstallation <#uninstallation>__
  • Demo <#demo>__
  • Nice to implement in the future <#nice-to-implement-in-the-future>__
  • Contributing <#contributing>__
  • Authors <#authors>__
  • License <#license>__

Description

This is a Python library and command-line tool that helps you search for a word or a phrase within an audio file (wav format). It also builts upon the initial searching capability and provides some [so-called] advanced searching abilities!

What can it do?

  • Index audio files (using Watson (Online/Higher-quality) or CMU Pocketsphinx (Offline/Lower-quality)) and save/load the results.
  • Searching within audio files in multiple languages (default is English)
  • Define a timing error for your queries to handle discrepencies.
  • Define constraints on your queries, e.g. whether to include (sub/super)sequences, results with missing words etc.
  • Do full blown regex pattern matching!

Documentation

To read the documentation, visit here <http://simpleaudioindexer.readthedocs.io/>__.

Requirements

  • Python (v2.7, 3.3, 3.4, 3.5 or 3.6) with pip installed.
  • Watson API Credentials and/or CMU Pocketsphinx
  • sox
  • ffmpeg (if you choose CMU Pocketsphinx)
  • py.text and tox (if you want to run the tests)

Installation

Open up a terminal and enter:

::

pip install SimpleAudioIndexer

Installation details can be found at the documentations here <https://simpleaudioindexer.readthedocs.io/installation/>__.

There's a dockerfile <https://raw.githubusercontent.com/aalireza/SimpleAudioIndexer/master/Dockerfile>_ included withing the repo if you're unable to do a native installation or are on a Windows system.

Uninstallation

Open up a terminal and enter:

::

pip uninstall SimpleAudioIndexer

Uninstalling sox, however, is dependent upon whether you're on a Linux or Mac system. For more information, visit here <https://simpleaudioindexer.readthedocs.io/installation/#uninstall>__.

Demo

Say you have this audio file:

|small_audio|

Have it downloaded to an empty directory for simplicity. We'd refer to that directory as SRC_DIR and the name of this audio file as small_audio.wav.

Here's how you can search through it.

Command-line Usage ++++++++++++++++++

Open up a terminal and enter.

::

$ sai --mode "ibm" --username_ibm USERNAME --password_ibm PASSWORD --src_dir SRC_DIR --search "called"

{'called': {'small_audio.wav': [(1.25, 1.71)]}}

Replace USERNAME and PASSWORD with your IBM Watson's credentials and SRC_DIR with the absolute path to the directory you just prepared.

The out would be, like above, a dictionary that has the query, the file(s) it appears in and the all of the (starting second, ending second) of that query.

Note that all commands work uniformally for other engines (i.e. Pocketsphinx), for example the command above can be enterred as:

::

$ sai --mode "cmu" --src_dir SRC_DIR --search "lives"

{'lives': {'small_audio.wav': [(3.12, 3.88)]}}

Which would use Pocketsphinx instead of Watson to get the timestamps. Note that the quality/accuracy of Pocketsphinx is much lower than Watson.

Instead of searching for a word, you could also match a regex pattern, for example:

::

$ sai --mode ibm --src_dir SRC_DIR --username_ibm USERNAME --password_ibm PASSWORD --regexp " [a-z][a-z] "

{u' in ': {'small_audio.wav': [(2.81, 2.93)]}, {u' to ': {'small_audio.wav': [(1.71, 1.81)]}}

That was the result of searching for two letter words. Note that your results would match any aribtrary regular expressions.

You may also save and load the indexed data from the command line script. For more information, visit here <https://simpleaudioindexer.readthedocs.io/usage/#as-a-command-line-script>__.

Library Usage +++++++++++++

Say you have this file

.. code-block:: python

from SimpleAudioIndexer import SimpleAudioIndexer as sai

Afterwards, you should create an instance of sai

.. code-block:: python

indexer = sai(mode="ibm", src_dir="SRC_DIR", username_ibm="USERNAME", password_ibm="PASSWORD")

Now you may index all the available audio files by calling index_audio method:

.. code-block:: python

indexer.index_audio()

You could have a searching generator:

.. code-block:: python

searcher = indexer.search_gen(query="called") print(next(searcher)) {'Query': 'called', 'File Name': 'small_audio.wav', 'Result': (1.25, 1.71)}

Now there are quite a few more arguments implemented for search_gen. Say you wanted your search to be case sensitive (by default it's not). Or, say you wanted to look for a phrase but there's a timing gap and the indexer didn't pick it up right, you could specify timing_error. Or, say some word is completely missed, then you could specify missing_word_tolerance etc.

For a full list, see the API reference here <./reference.html #SimpleAudioIndexer.SimpleAudioIndexer.search_gen>__

Note that you could also call search_all method to have search for a list of queries within all the audio files:

Finally, you could do a regex search!

.. code-block:: python

print(indexer.search_regexp(pattern="[A-Z][^l]* ") {u'Americans are ca': {'small_audio.wav': [(0.21, 1.71)]}}

There are more functionalities implemented. For detailed explainations, read the documentation here <https://simpleaudioindexer.readthedocs.io/usage/#as-a-python-library>__.

Nice to implement in the future

  • Uploading in parallel
  • More control structures for searching (Typos, phoneme based approximation of words using CMU_DICT or NLTK etc.)
  • Searching for an unintelligible audio within the audio files. Possibly by cross correlation or something similar.

Contributing

Should you want to contribute code or ideas, file a bug request or give feedback, Visit the CONTRIBUTING <https://github.com/aalireza/SimpleAudioIndexer/blob/master/CONTRIBUTING.rst>_ file.

Authors

  • Alireza Rafiei - aalireza <https://github.com/aalireza>_

See also the list of contributors <https://github.com/aalireza/SimpleAudioIndexer/graphs/contributors>_ to this project.

License

This project is licensed under the Apache v2.0 license - see the LICENCE <https://github.com/aalireza/SimpleAudioIndexer/blob/master/LICENSE>_ file for more details.

.. |license| image:: https://img.shields.io/pypi/l/SimpleAudioIndexer.svg :target: LICENSE :alt: Apache v2.0 License

.. |docs| image:: https://readthedocs.org/projects/simpleaudioindexer/badge/?version=latest :target: http://simpleaudioindexer.readthedocs.io/?badge=latest :alt: Documentation Status

.. |build| image:: https://travis-ci.org/aalireza/SimpleAudioIndexer.svg?branch=master :target: https://travis-ci.org/aalireza/SimpleAudioIndexer :alt: Build status

.. |python| image:: https://img.shields.io/pypi/pyversions/SimpleAudioIndexer.svg :alt: Python 2,7, 3,3, 3.4, 3.5, 3.6 supported

.. |wheel| image:: https://img.shields.io/pypi/wheel/SimpleAudioIndexer.svg :alt: Wheel ready

.. |small_audio| image:: http://rafiei.net/assets/play_button.png :target: http://rafiei.net/assets/sai/small_audio.wav :alt: Demo audio file

.. _Documentation: https://github.com/aalireza/SimpleAudioIndexer/docs

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].