Logo Multimodal Gesture Recognition

Chalearn Multimodal Gesture Recognition

ChaLearn organizes in 2013 a challenge and workshop on multi-modal gesture recognition from 2D and 3D video data using Kinect, in conjunction with ICMI 2013, December 9-13, Sidney, Australia. Kinect is revolutionizing the field of gesture recognition given the set of input data modalities it provides, including RGB image, depth image (using an infrared sensor), and audio. Gesture recognition is genuinely important in many multi-modal interaction and computer vision applications, including image/video indexing, video surveillance, computer interfaces, and gaming. It also provides excellent benchmarks for algorithms. The recognition of continuous, natural signing is very challenging due to the multimodal nature of the visual cues (e.g., movements of fingers and lips, facial expressions, body pose), as well as technical limitations such as spatial and temporal resolution and unreliable depth cues.

Reference

The data generated for the challenge is made available for research purposes. In case you use this data on your work, please add the following reference. You can check a detailed description of the data and results here.

S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, O. Lopes, I. Guyon, V. Athistos, H.J. Escalante, "Multi-modal Gesture Recognition Challenge 2013: Dataset and Results", ICMI 2013.

Matlab Code

DataViewer

Basic Matlab GUI to visualize the data (RGB, Depth and Audio) and export it in order to be used

Download [25/05/2013]

Utils

Basic Matlab scripts for different purposes:

Download [26/05/2013]

Training Data

Training data (RGB+Depth+Audio) and labels for 393 sessions, which correspond to 7.754 italian gestures.

Data sources example

In order to make easy the download, the data is divided into 4 files. Download all files and expand them in the same directory.

Choose a mirror:

  • Logo UOC
  • Logo CVC
  • Logo INAOE

Logo UOC

Universitat Oberta de Catalunya

Barcelona, Spain

Logo CVC

Computer Vision Center

Bellaterra, Barcelona, Spain

Logo CVC

Coordinación de Ciencias Computacionales, INAOE

Puebla, México

Training labels in the predictions output format. Can be used to get the error of your predictions over the groundtruth.

Training groundtruth file

Validation data (RGB+Depth+Audio) has the same format than trainning data, but labels are not provided. There are 287 sessions, which correspond to 3.362 italian gestures.

In order to make easy the download, the data is divided into 3 files. Download all files and expand them in the same directory.

Choose a mirror:

  • Logo UOC
  • Logo CVC
  • Logo INAOE

To avoid downloading again the validation data, we provide a file with all the labels and a Matlab script to update your validation files. Download both files and unzip the labels file in a local folder. Then use the script passing as parameters the path to the directory where you have your unlabeled sample files and the path with the unziped labeled files.

Some help information is included in the script.

Matlab Script
Files with validation labels

Test data have exactly the same structure than the validation set. It contains 276 files containing a total of 2742 italian gestures.

The password can be downloaded here.

Choose a mirror:

  • Logo UOC
  • Logo CVC
  • Logo INAOE

Logo UOC

Universitat Oberta de Catalunya

Barcelona, Spain

Logo CVC

Computer Vision Center

Bellaterra, Barcelona, Spain

Logo CVC

Coordinación de Ciencias Computacionales, INAOE

Puebla, México

Labels can be downloaded here