2014: Chalearn LAP Human Pose Recovery

ChaLearn organizes in 2014 three parallel challenge tracks on Human Pose Recovery on RGB data, action/interaction spotting on RGB data, and gesture spotting on RGB-Depth data.
The focus of this track is on mult-limb, user independent pose recovery, which means learning to recognize limbs from several instances for each limb class belonging to different actors. For this particular track, more than 8000 images were labelled at pixel precisions with 14 limbs (more than 120000 human limbs were manually labelled). Users appear portraying different poses and interacting with secondary actors in the same scene. In all cases all actors taking part in the scene are manually labelled with the 14 different limbs, if they are visible.

References

The data generated for the challenge is made available for research purposes. In case you use this data on your work, please add the following reference.
 Escalera, Sergio; Baró, Xavier; Gonzàlez, Jordi; Bautista, Miguel Ángel; Madadi, Meysam; Reyes, Miguel; Ponce-López, Victor; Escalante, Hugo Jair; Shotton, Jamie; Guyon, IsabelleProceedings of the European Computer Vision Conference (ECCV 2014 Workshops), Part I, pp. 459–473, Springer International Publishing, Zurich, Switzerland, 2015, ISBN: 978-3-319-16178-5.

Organizers

Sergio Escalera: Dept. Applied Mathematics, Universitat de Barcelona & Computer Vision Center. (sergio(at)maia.ub.es)
Jordi Gonzàlez: Dept. Computer Science & Computer Vision Center (UAB), Barcelona. (poal(at)cvc.uab.es)
Xavier Baró: EIMT at the Open University of Catalonia & Computer Vision Center. (xbaro(at)uoc.edu)
Miguel A. Bautista: Dept. Applied Mathematics, Universitat de Barcelona & Computer Vision Center. (miguelangelbautistamartin(at)gmail.com )
Miguel Reyes: Dept. Applied Mathematics, Universitat de Barcelona & Computer Vision Center. (mreyes(at)gmail.com)
Víctor Ponce-López: EIMT at the Open University of Catalonia & Computer Vision Center. (vponcel(at)uoc.edu)
Hugo J. Escalante: INAOE, Puebla, Mexico. (hugojair(at)inaoep.mx)
Jamie Shotton: Microsoft Research, Cambridge, UK. (jamiesho(at)microsoft.com)
Isabelle Guyon: ChaLearn, Berkeley, California. (guyon(at)chalearn.org)

Winners

[Could not find the bibliography file(s)
PositionTeam NameScoreCode
1ZJU0.194144
2Seawolf0.182097

ICMI2013: Winners with the organizers

Those are the research works that are using this dataset and their published scores. If you want to appear in this list, please submit a mail to xbaro(at)uoc.edu with the reference to your published work, the obtained score and if your code is available a link to your code.

[Could not find the bibliography file(s)

Lev. DistanceReferenceCode
0.11802Pavlakos et al. ICIP2014[?]N/A
0.12756Wu et al. ICMI2013[?]N/A
0.17105Bayer & Silbermann ICMI2013[?]N/A

Data description

Problem setting

The focus of this track is on mult-limb, user independent pose recovery, which means learning to recognize limbs from several instances for each limb class belonging to different actors. For this particular track, more than 8000 images were labelled at pixel precisions with 14 limbs (more than 120000 human limbs were manually labelled). Users appear portraying different poses and interacting with secondary actors in the same scene. In all cases all actors taking part in the scene are manually labelled with the 14 different limbs, if they are visible.
The dataset is composed by 9 RGB sequences, containing more than 8000 frames and more than 120000 manually labelled limbs in total. For each frame we provide the RGB image and 14 binary masks corresponding to each one of the limbs. For each binary mask, 1-valued pixels indicate the region in which the limb is contained.

Limbs annotated in the dataset

Data structure

The data is organized as a set of sequences, each one unically identified by an string SeqXX, where XX is a 2 integer digit number. Each sequence is provided as a single ZIP file named with its identifier (eg. SeqXX.zip).

Each sample ZIP file contains the following files:

• /imagesjpg: Set of RGB images composing the sequence. Each rgb file name denotes the sequence and number of frame of the image (XX_YYYY.jpg denotes the YYYY frame at the XX sequence).
• /maskspng: For each RGB image in the /imagesjpg folder we define 14 binary masks which denote the region in which a certain limb is positioned. Each binary mask file name follows the pattern XX_YYYY_W_Z.png, where XX denotes the sequence, YYYY denotes the frame, W denotes the actor in the sequence (1 if its at the left part of the image, 2 if its at the right part) and Z denotes the limb number (following the ordering defined in the figure above).

Evaluation

It will be used the Jaccard Index (overlapping). Thus, for each one of the n≤14 limbs labeled for each subject on each frame i, the Jaccard Index is defined as follows:
$J_{i,n}=\frac{A_{i,n}\cap B_{i,n}}{A_{i,n}\cup B_{i,n}}$
where $A_{i,n}$ is the ground truth of limb n, and $B_{i,n}$ is the prediction for the same limb (at image i). For the dataset in this challenge both Ai,n and Bi,n are binary images where ‘1’ pixels denote the region in which the n-th limb is predicted. Particularly, since $A_{i,n}$ (ground truth) is a binary image and 1-value pixels indicate the region of the n-th limb, this positive region does not necessarily need to be square. However, in all cases the positive region is a polyhedron defined by four points. Thus, numerator is the number ‘1’ pixels that intersects in both images $A_{i,n}$ and $B_{i,n}$, and denominator is the number of union ‘1’ pixels after applying local or operator.

In the case of false positives (e.g predicting a limb that is not on the ground truth because of being occluded), the prediction will not affect the mean Hit Rate calculation. In other words n is computed as the intersection of the limb categories in the ground truth and the predictions.

Participant methods will be evaluated upon hit rate (HR) detection of limbs. That is, for each limb n at each image i a hit will be computed if $J_{i,n}\ge0.5$. Then, the mean hit rate among all limbs for all images will be computed (where all limb detections will have the same weight) and the participant with the highest mean hit rate will be the winner.
$HR_{i,n}= \begin{cases} 1 & \quad \text{1,} \frac{A_{i,n}\cap B_{i,n}}{A_{i,n}\cup B_{i,n}}\ge0.5\\ 0 & \quad \text{otherwise}\\ \end{cases}$
In some images a limb may not labeled in the ground truth because of occlusions. In that case where n<14, participants must not provide with any prediction of that particular limb. An example of the mean hit rate calculation for an example of n=3 limbs and i=1 image is show in next figure.

Mean hit rate and Jaccard Index calculation for a sample with n=3 limbs and i=1 image. In the top part of the image the Jaccard Index for the head limb is computed, as it is greater than 0.5 then it computes as a hit for the image i and the head limb. Similarly, for the torso limb the Jaccard Index obtained is 0.72 (center part of the image) which also computes as a hit for torso limb. In addition, in the bottom of the image the Jaccard Index obtained for the left thigh limb is shown, which does not compute as a hit since 0.04

Code

Source Code

The organizers encourage the use of Python. We provide scripts to facilitate the access to the data and for evaluation purposes. Scripts can be download from here

Requirements

• OpenCV 2.4.8
• Python Imaging Library (PIL) 1.1.7
• NumPy 1.8.0

Data Access

On the file ChalearnLAPSample.py there is a class ActionSample that allows to access all information from a sample. In order to open a sample file, use the constructor with the ZIP file you want to use:

>> from ChalearnLAPSample import PoseSample

>> poseSample = PoseSample("SeqXX.zip")

With the given object you can access to the sample general information. For instance, get the number of frames, the fps or the max depth value:

>> numFrames=actionSample.getNumFrames()

Additionaly we can access to any information of any frame. For instance, to access the RGB information for the 10th frame, we use:

>> rgb=poseSample.getRGB(10)

To visualize information of a frame, you can use this code:

import cv2
from ChalearnLAPSample import poseSample

poseSample = PoseSample("Seqxx.zip")
actorid=1 limbid=2

cv2.namedWindow("Seqxx",cv2.WINDOW_NORMAL) cv2.namedWindow("Torso",cv2.WINDOW_NORMAL)
for x in range(1, poseSample.getNumFrames()):
img=poseSample.getRGB(x) torso=poseSample.getLimb(x,actorid,limbid)
cv2.imshow("Seqxx",img) cv2.imshow("Torso",torso)
cv2.waitKey(1)
del poseSample
cv2.destroyAllWindows()

Evaluation

On the file ChalearnLAPEvaluation.py there are some methods for evaluation. The first important script allows to export the labels of a set of frames into a ground truth folder, to be used to get the final ovelap value. Let’s assume that you use the sequences 1 to 3 for validation purposes, and have a folder valSamples with the files Seq01.zip to Seq03.zip as you downloaded from the training data set. We can create a ground truth folder gtData using:

>> from ChalearnLAPEvaluation import exportGT_Pose

>> exportGT_Pose(valSamples,gtData)

This method exports the label files and data files for each sample in the valSample folder to the gtData folder. This new ground truth folder will be used by evaluation methods.

For each RGB image, we need to store the binary mask predictions in JPG files in the same format as the ground truth are provided. That is a JPG binary file for each limb category at each RGB and for each actor. This file must be named XX_YYYY_W_Z_prediction.jpg where XX denotes the sequence, YYYY denotes the frame, W denotes the actor in the sequence (1 if its at the left part of the image, 2 if its at the right part) and Z denotes the limb number. To make it easy, the class PoseSample allows to store this information for a given sample. Following the example from last section, we can store the predictions for sample using:

>> from ChalearnLAPSample import PoseSample

>> poseSample = PoseSample("SeqXX.zip")

Now, if our predictions are that we have not detected the head (limbid = 1) for the first actor in the scene in the frame of the sequence, and we want to store predictions in a certain folder valPredict, we can use the following code:

>> actionSample = poseSample("SeqXX.zip")

>> im1=numpy.zeros((360,480))

Assuming previous defined paths and objects, to evaluate the overlap for a single labeled sample prediction, that is, prediction for a sample from a set where labels are provided, we can use:

>> overlap=poseSample.evaluate(valPredict)

Finally, to obtain the final score for all the predictions, in the same way performed in the Codalab platform, we use:

>> from ChalearnLAPEvaluation import exportGT_Pose

>> score=evalPose(valPredict,predData)

Once the form will be submitted you will be able to see the information to access the data, if not, just refresh the page.