Skip to the content.

Youngwoo Yoon*, Pieter Wolfert*, Taras Kucherenko*, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

[Full challenge paper (ACM TOG)] [Initial publication (ICMI’22)]


This webpage contains data, code, and results from the second GENEA Challenge, intended as a benchmark of data-driven automatic co-speech gesture generation. In the challenge, participating teams used a common speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was then rendered to video using a standardised visualisation and evaluated in several large, crowdsourced user studies. This year’s dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in dyadic conversation, taken from the Talking With Hands 16.2M dataset. Ten teams participated in the evaluation across two tiers: full-body and upper-body gesticulation. For each tier we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech.

The evaluation results are a revolution, and a revelation: Some synthetic conditions are rated as significantly more human-like than human motion capture. At the same time, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings.

Please see our paper for more information, the challenge introduction video below, and the links below for the challenge data, code, and results.

Open-source materials


If you use materials from this challenge, please cite our latest paper about the challenge. Currently, that is our paper at ICMI 2022:

  author={Yoon, Youngwoo and Wolfert, Pieter and Kucherenko, Taras and Viegas, Carla and Nikolov, Teodor and Tsakov, Mihail and Henter, Gustav Eje},
  title={{T}he {GENEA} {C}hallenge 2022: {A} large evaluation of data-driven co-speech gesture generation},
  booktitle={Proceedings of the ACM International Conference on Multimodal Interaction},
  series={ICMI '22},

Also consider citing the original paper about the motion data from Meta Research:

  title={{T}alking {W}ith {H}ands 16.2{M}: {A} large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis},
  author={Lee, Gilwoo and Deng, Zhiwei and Ma, Shugao and Shiratori, Takaaki and Srinivasa, Siddhartha S. and Sheikh, Yaser},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  series={ICCV '19},