Skip to the content.

Youngwoo Yoon*, Pieter Wolfert*, Taras Kucherenko*, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

[Full challenge paper (ACM TOG)] [Initial publication (ICMI’22)]


Summary

This webpage contains data, code, and results from the second GENEA Challenge, intended as a benchmark of data-driven automatic co-speech gesture generation. In the challenge, participating teams used a common speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was then rendered to video using a standardised visualisation and evaluated in several large, crowdsourced user studies. This year’s dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in dyadic conversation, taken from the Talking With Hands 16.2M dataset. Ten teams participated in the evaluation across two tiers: full-body and upper-body gesticulation. For each tier we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech.

The evaluation results are a revolution, and a revelation: Some synthetic conditions are rated as significantly more human-like than human motion capture. At the same time, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings.

Please see our paper for more information, the challenge introduction video below, and the links below for the challenge data, code, and results.

Open-source materials

Citation

If you use materials from this challenge, please cite our paper about the challenge:

@article{kucherenko2024evaluating,
  author = {Kucherenko, Taras and Wolfert, Pieter and Yoon, Youngwoo and Viegas, Carla and Nikolov, Teodor and Tsakov, Mihail and Henter, Gustav Eje},
  title = {Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022},
  year = {2024},
  issue_date = {June 2024},
  publisher = {Association for Computing Machinery},
  volume = {43},
  number = {3},
  issn = {0730-0301},
  url = {https://doi.org/10.1145/3656374},
  doi = {10.1145/3656374},
  month = {jun},
  articleno = {32},
}

Also consider citing the original paper about the motion data from Meta Research:

@inproceedings{lee2019talking,
  title={{T}alking {W}ith {H}ands 16.2{M}: {A} large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis},
  author={Lee, Gilwoo and Deng, Zhiwei and Ma, Shugao and Shiratori, Takaaki and Srinivasa, Siddhartha S. and Sheikh, Yaser},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={763--772},
  doi={10.1109/ICCV.2019.00085},
  series={ICCV '19},
  publisher={IEEE},
  year={2019}
}

Contact

Hits