Thursday, 16 May 2013

Download - Speech Recognition For The Kinect: Kinect Speech Media Controller



SPEECH RECOGNITION FOR THE KINECT: KINECT SPEECH MEDIA CONTROLLER

Pleased to say the full source for this demo is available to download here from my website. Check out a video of the app in action below. Notice the Kinect reacts to each of your commands with a unique response - this way you can be sure the command is being understood properly. 



Have a go at using different combinations of words and phrases to control the media player. Download the source code and modify it to see what you can come up with. The code can be easily modified to allow you to control any number of Windows applications using voice recognition. You can easily add a load more voice commands, or combine them with gestures  to get more complex interactions. 

Let me know what you think in the comments below, 

Enjoy!

Download here


Contact: michaelpalmer.mp@gmail.com
Website: www.michaelpalmerwebdesign.com

Or leave a comment on my YouTube page

Tuesday, 14 May 2013

Kinect Speech Media Controller


A new demo showing how the Kinect can be used to control Windows Media Player by voice. You get to change and choose the words and phrases used to control it. Play / pause, rewind / fast-forward, open and close are just some of the 15+ commands you can change and edit.



Read more about the Kinect Speech Media Controller here. And keep your eye out for the source code which will be available for download next week. 

Saturday, 16 June 2012

Speech Recognition for the Kinect, the Easy Way...

Speech Recognition for the Kinect:

In the development of Kin-educate I found this to be one of the most tricky parts. Largely because I couldn't find any complete tutorials out there other than the quick start series at channel 9, which, if you have checked it out, you will know is helpful, but not comprehensive.

I have been asked by quite a lot of people about how I did the speech recognition in the maths game for Kin-educate, so I thought I would do a quick tutorial that cuts out all the unnecessary bits, and just focuses on getting you set up and speech recognition working quickly and easily. This tutorial assumes you have a Kinect project set up already - if you do, you should be able to just copy and paste this code, in order, and you're all set!

*You decide what kind of outputs you would like for the speech recognition, but for this example I have used just three text boxes for feedback. One for the hypothesized result (good for debugging), one for the rejected speech, and one for the reply - when speech is recognized.

Add using statements and references:

//Make sure to add a reference to Kinect in the references
using Microsoft.Kinect;
//Make sure you have the speech SDK installed
//go to add reference, browse, navigate to program files, micrsoft SDKs
//speech, assemblies and select speech.dll
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;
using System.IO;

Then, declare your variables and get the speech recognizer:

        //Create an instance of your kinect sensor
        public KinectSensor CurrentSensor;
        //and the speech recognition engine (SRE)
        private SpeechRecognitionEngine speechRecognizer;
        //Get the speech recognizer (SR)
        private static RecognizerInfo GetKinectRecognizer()
        {
            Func<RecognizerInfo, bool> matchingFunc = r =>
            {
                string value;
                r.AdditionalInfo.TryGetValue("Kinect", out value);
                return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase);
            };
            return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault();
        }

When the window loads, we need to initialize the Kinect sensor:

        //When the window loads, initialize the Kinect
        public MainWindow()
        {
            InitializeComponent();
            InitializeKinect();
        }
       
        //Initilaize the kinect
        private KinectSensor InitializeKinect()
        {
            //get the first available sensor and set it to the current sensor variable
            CurrentSensor = KinectSensor.KinectSensors
                                  .FirstOrDefault(s => s.Status == KinectStatus.Connected);
            speechRecognizer = CreateSpeechRecognizer();
            //Start the sensor
            CurrentSensor.Start();
            //then run the start method to start streaming audio
            Start();
            return CurrentSensor;
        }

Now we need to configure the audio stream:

        //Start streaming audio
        private void Start()
        {
            //set sensor audio source to variable
            var audioSource = CurrentSensor.AudioSource;
            //Set the beam angle mode - the direction the audio beam is pointing
            //we want it to be set to adaptive
            audioSource.BeamAngleMode = BeamAngleMode.Adaptive;
            //start the audiosource 
            var kinectStream = audioSource.Start();
            //configure incoming audio stream
            speechRecognizer.SetInputToAudioStream(
                kinectStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
            //make sure the recognizer does not stop after completing     
            speechRecognizer.RecognizeAsync(RecognizeMode.Multiple);
            //reduce background and ambient noise for better accuracy
            CurrentSensor.AudioSource.EchoCancellationMode = EchoCancellationMode.None;
            CurrentSensor.AudioSource.AutomaticGainControlEnabled = false;
        }

Here we set the culture, define the words we want our program to recognize, and set up the grammar builder:

        //here is the fun part: create the speech recognizer
        private SpeechRecognitionEngine CreateSpeechRecognizer()
        {
            //set recognizer info
            RecognizerInfo ri = GetKinectRecognizer();
            //create instance of SRE
            SpeechRecognitionEngine sre;
            sre = new SpeechRecognitionEngine(ri.Id);

            //Now we need to add the words we want our program to recognise
            var grammar = new Choices();
            grammar.Add("hello");
            grammar.Add("goodbye");

            //set culture - language, country/region
            var gb = new GrammarBuilder { Culture = ri.Culture };
            gb.Append(grammar);

            //set up the grammar builder
            var g = new Grammar(gb);
            sre.LoadGrammar(g);

            //Set events for recognizing, hypothesising and rejecting speech
            sre.SpeechRecognized += SreSpeechRecognized;
            sre.SpeechHypothesized += SreSpeechHypothesized;
            sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
            return sre;
        }

Now all we need to do is set up the methods for hypothesizing, recognizing and rejecting speech:

        //if speech is rejected
        private void RejectSpeech(RecognitionResult result)
        {
            textBox2.Text = "Pardon Moi?";
        }

        private void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
        {
            RejectSpeech(e.Result);
        }

I use the hypothesized result for debugging and changing the confidence level for managing accuracy:

        //hypothesized result
        private void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
        {
            textBox1.Text = "Hypothesized: " + e.Result.Text + " " + e.Result.Confidence;
        }

This is where we decide what happens when speech is recognized. The confidence level is set quite low here. Experiment with it to see what suits you best:

        //Speech is recognised
        private void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            //Very important! - change this value to adjust accuracy - the higher the value
            //the more accurate it will have to be, lower it if it is not recognizing you
            if (e.Result.Confidence < .4)
            {
                RejectSpeech(e.Result);
            }
            //and finally, here we set what we want to happen when 
            //the SRE recognizes a word
            switch (e.Result.Text.ToUpperInvariant())
            {
                case "HELLO":
                    textBox3.Text = "Hi there.";
                    break;
                case "GOODBYE":
                    textBox3.Text = "Goodbye then.";
                    break;
                default:
                    break;
            }
        }

And that is that. You should now have speech recognition working within your Kinect program. Check back for the next blog where I will be expanding upon this by making a speech-based application for controlling your media player!


Contact info:
mickpal_@hotmail.com
michaelpalmer.mp@gmail.com
www.michaelpalmerwebdesign.com

Or, leave a comment on my YouTube channel


Wednesday, 30 May 2012

Kinect Speech-recognition tutorial and code

Just a quick note to say I am working on a brief tutorial outlining all the steps I took to get the speech-recognition working for the maths game in Kin-educate. When researching for this I noted just how little information there actually is out there, and after a day of tackling the speech demo included with the SDK I managed to get it working. I thought it best to save others the bother I went through and post it up here.

So check back in the next week or so for a full low down on how I did it, and for all the code. I will make it generic too so you can easily copy & paste in to your program and you should be all set.

I have also had a load of requests asking about how I did the picking up / dragging and dropping for the spelling game part of Kin-educate.This is actually a little more complicated than the speech recognition. I am working on another tutorial for that and aim to post it in the next week or so.

If there are any other questions about how I built Kin-educate, or suggestions for future developments keep them coming in.

Check back soon for tutorials and code.


Contact info:
mickpal_@hotmail.com
michaelpalmer.mp@gmail.com
www.michaelpalmerwebdesign.com

Or, leave a comment on my YouTube channel

Tuesday, 22 May 2012

Kin-Educate Source Download

A lot of people have been asking me when or where they might be able to download the Kin-educate source.

At the moment it is still undergoing testing just to make sure all the little creases are ironed out. For example, the speech recognition still needs tweaking a little bit - it struggles in a room with quite a bit of background noise. Hopefully it will be ready to go in the next two weeks or so.

Keep an eye on my blog and YouTube channel  for updates.

I am currently working a new game to add in to the Kin-educate mix. I am not quite ready to release details yet, but it fits nicely with the primary educational trend of the spelling and maths games, and also makes good use of the Kinect features.

I have had quite a few suggestions for different types of games or little tweaks and changes from people too which are really great and in time will be added into Kin-Educate. So if you think you have a good idea feel free to share it, or if you want to collaborate or help out then just let me know!


Contact info:
mickpal_@hotmail.com
michaelpalmer.mp@gmail.com
www.michaelpalmerwebdesign.com

Or, leave a comment on my YouTube channel

Wednesday, 9 May 2012

Kin-Educate: An educational game for the Kinect

Kin-educate is part of a research project exploring innovative ways to interact with computers, and what better tool to use than the Kinect?

But, what kind of Kinect application to interact with? Well, since its release the Kinect very quickly became synonymous with education. Developers could see the potential as could educators. So this seemed like the perfect place to start experimenting with the Kinect – by developing a fun, innovative educational game. The result is Kin-educate.


Kin-educate features two mini games. One, a spelling game, randomly chooses a word, shuffles the letters and then gives the player 30 seconds to grab, drag, and drop the letters, in order, to complete the word.
The maths game focuses on mental arithmetic. Randomized sums are generated and the player has to shout out the answer. A high score gets you your photo taken using the kinectColorViewer to capture an image of the player and display it.


The game is being tested at the moment and once any creases are ironed out I will make the project and source files available to all. In the meantime you can watch some videos...



  • Instead of using the coding4Fun hoverButton control on the buttons and having the timer ring appear on them, I wanted the timer ring to appear on the hand. I did this by using the hoverButton function on the hand symbol to trigger the events on the selected button. 
  • So when you hover over a letter in the spelling game (which is a button) the hoverButton triggers the letter buttons methods – in this case using the Kinect colorImagePoint to map the letter to the active hands location then I quickly switch the hand image to a closed fist to make it look like it is holding something.



  • At any one time only one hand is actually active. This is controlled by gestures – as simple as; if the right hand is higher than the left then apply the hoverButton control to this hand. This helped get around any left or right handed problems, and also makes it much easier to play the spelling game.
  • Speech recognition is embedded within the entire application. It is all part of making it an engaging and interactive way of learning. So why put your arms at a weird angles or press a button on a keyboard to stop a game when you could simply say “Stop”? When you’re ready just say “Continue” or if you have had enough just say “Quit”.


The speech recognition for the maths game was one of the most challenging aspects of developing Kin-Educate. Accounting for accents, poor annunciation, background noise, and then the number of variables involved in randomized sums (the number of potential answers) was pretty tricky. I managed to get around this by limiting the potential answers of the randomized sums to between 0 and 10, and then, once the word is recognized I compared this to the actual answer with a few checks. It still has some trouble with accents but within a couple of tries most people get the hang of it.

Kin-educate is still under development and will be an ever-evolving project. Different games can be added over time utilising even more of the great Kinect capabilities and any suggestions are welcome. The next development is already in design. Any feedback is always appreciated, and keep an eye out for when the project will be available to download - soon!

Contact info:
mickpal_@hotmail.com
michaelpalmer.mp@gmail.com
www.michaelpalmerwebdesign.com

Or, leave a comment on my YouTube channel