All Projects → opencog → TinyCog

opencog / TinyCog

Licence: other
Small Robot, Toy Robot platform

Programming Languages

C++
36643 projects - #6 most used programming language
scheme
763 projects
CMake
9771 projects
shell
77523 projects

Projects that are alternatives of or similar to TinyCog

Lingvo
Lingvo
Stars: ✭ 2,361 (+8041.38%)
Mutual labels:  speech-synthesis, speech-recognition
web-speech-cognitive-services
Polyfill Web Speech API with Cognitive Services Bing Speech for both speech-to-text and text-to-speech service.
Stars: ✭ 35 (+20.69%)
Mutual labels:  speech-synthesis, speech-recognition
Mycroft Precise
A lightweight, simple-to-use, RNN wake word listener
Stars: ✭ 481 (+1558.62%)
Mutual labels:  embedded-systems, speech-recognition
Kalliope
Kalliope is a framework that will help you to create your own personal assistant.
Stars: ✭ 1,509 (+5103.45%)
Mutual labels:  speech-synthesis, speech-recognition
ml-with-audio
HF's ML for Audio study group
Stars: ✭ 104 (+258.62%)
Mutual labels:  speech-synthesis, speech-recognition
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (+358.62%)
Mutual labels:  speech-synthesis, speech-recognition
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+82.76%)
Mutual labels:  speech-synthesis, speech-recognition
Speech ai
Simple speech linguistic AI with Python
Stars: ✭ 66 (+127.59%)
Mutual labels:  speech-synthesis, speech-recognition
AmazonSpeechTranslator
End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.
Stars: ✭ 50 (+72.41%)
Mutual labels:  speech-synthesis, speech-recognition
speechrec
a simple speech recognition app using the Web Speech API Interfaces
Stars: ✭ 18 (-37.93%)
Mutual labels:  speech-synthesis, speech-recognition
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (+255.17%)
Mutual labels:  speech-synthesis, speech-recognition
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Stars: ✭ 205 (+606.9%)
Mutual labels:  speech-synthesis, speech-recognition
Openseq2seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Stars: ✭ 1,378 (+4651.72%)
Mutual labels:  speech-synthesis, speech-recognition
Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (+489.66%)
Mutual labels:  speech-synthesis, speech-recognition
Cross vc
Cross-lingual Voice Conversion
Stars: ✭ 91 (+213.79%)
Mutual labels:  speech-synthesis, speech-recognition
idear
🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (+189.66%)
Mutual labels:  speech-synthesis, speech-recognition
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+1768.97%)
Mutual labels:  speech-synthesis, speech-recognition
Artyom.js
A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.
Stars: ✭ 1,011 (+3386.21%)
Mutual labels:  speech-synthesis, speech-recognition
Khronos
The open source intelligent personal assistant
Stars: ✭ 25 (-13.79%)
Mutual labels:  speech-synthesis, speech-recognition
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+2800%)
Mutual labels:  speech-synthesis, speech-recognition

TinyCog

A collection of speech, vision, and movement functionalities aimed at small or toy robots on embedded systems, such as the Raspberry Pi computer. High level reasoning, language understanding, language gneration and movement planning is provided by OpenCog.

The current hardware platform requires an RPI3 computer, a Pi Camera V2 and a USB Microphone; other sensor/detector components are planned.

The current software functions include face detection, emotion recognition, gesture analysis, speech-to-text and text-to-speech subsystems. All high-level founction is provided by OpenCog, and specifically by the Ghost scripting system -- ghost is able to process sensory input, and provide coordinated chatbot and movement abilities.

Setup

Everything here is meant to run on a rpi3 computer; one can also compile everything on a standard Linux desktop computer.

A fully prepped raspbian image is available here

Use xzcat to clone the image as shown here replacing sdX with your device.

    xzcat oc-debian-stretch-arm64.img.xz | sudo dd of=/dev/sdX

When you first boot with this image, and login with the default credentials, it automatically expands the filesystem to occupy the entire / partition and then it reboots. This is a 64bit Debian Stretch OS for RPI3 which means that PiCamera driver is not available. A USB camera should be used for this image.

The default credentials:

    Username: oc
    Password: opencog

The image contains the opencog version at the time of it's building and other libraries such as opencv and dlib (3.4 and 19.15). To see the opencog commit version, pkg-config can be used.

    pkg-config --variable=cogutil opencog    #shows cogutil commit version
    pkg-config --variable=atomspace opencog    #shows opencog commit version
    pkg-config --variable=opencog opencog    #shows opencog commit version

There is a problem with this image, no driver for piCamera as it's not available as a 64bit binary.

Install

Need to have these whether on desktop or rpi

Use cmake for building. Default build mode is Debug mode. Set CMAKE_BUILD_TYPE to Release to disable debug mode. For the emotion recognition service you should set the variable SERVER_ADDRESS to "34.216.72.29:6205"

    cd to TinyCog dir
    mkdir build
    cd build
    cmake ..  # -DCMAKE_BUILD_TYPE=Release -DSERVER_ADDRESS="34.216.72.29:6205" 
    make

Testing

  • To test the sensors from the guile shell, run the following from within the build dir which opens up the camera and does a live view of the camera with markings for the sensors.
    $ ./TestDrRoboto.scm
  • To test from a video file instead of a camera, run the folloiwng way
    $ ./TestDrRoboto.scm -- <video_file_path>

Running

    $ guile -l dr-roboto.scm
  • In another terminal, connect to port 5555 via telnet to input speech
    $ telnet localhost 5555

Implementation

Overall Description

The dr-roboto.cpp file is compiled to a guile extension which is loaded with the scheme dr-roboto.scm file. This guile extension is written in C++ and it's main job is to open the camera and sense stuff. When the scheme program loads the extension, the first thing it does is it sends the address of its atomspace to the extension so that the two can share an atomspace. Then the sensors are started which is a loop run in a separate thread that just collects information and places them in the atomspace. Most sensory values are stored with Atomspace Values in the following format:

    Value
        ConceptNode "position"
        ConceptNode "face_x"
        FloatValue X Y

The scheme program dr-roboto.scm includes the behavior/behavior.scm code that contains a very small model of OpenCog behavior tree. The behavior is a looking behavior which first goes through checking if there is a face, if there is only one then just look at that one, if there are more than one then check if one of them is smiling, if not, check if any of them has a non-neutral facial expression, if not just look at one of the faces randomly. If there are no faces in view then just look at the salient point. The behavior tree calls functions that simply check the atomspace for the information they require. The behavior/behavior.scm file also loads the ghost scripts located here When the behavior program wants to command the Professor Einstein robot of Hanson Robotics, it calls functions defined in cmd_einstein.scm This program connects to the Professor Einstein robot through its socket api and sends it commands. The fuctions.scm file contains some utility functions used by other scheme source files such as converting ghost results which are a list of WordNodes to a single string to be spoken by the robot and mapping of values between the image dimensions and the robot's pan/tilt limit.

Sensors

Sensors are a camera and microphones. The camera for face detection -> face landmark -> facial expression and emotion and also for hand detection -> convexity defects -> fingers count and gestures. The microphone for STT and sound source localization. Some of the sensor programs such as the face, hand and voice activity detection can run on the PI without much stress on the hardware but other functionalities like emotion and and speech recogntion should be implemented as services from a server possibly from singnet. Currently, STT is implemented using pocketsphinx. It's not ideal but can be used for a very limited range of commands and simple conversation.

Act

For Hanson Robotic's Professor Einstein robot, the cmd_einstein.scm file contains the code necessary to command it. The robot should also act as well as sense. It must speak and move around. The speech synthesis utilizes festival. The code is in act/audio. Movement was intended to be with SPI communication with the hardware but that has changed. However the spi interface is in comm/spi

Behavior

ToDo

  • Improve STT
  • Ghost rules
  • Stories for a specific identity we need the robot to have
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].