A collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version. The videos include human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 400/600/700 video clips. Each clip is human annotated with a single action class and lasts around 10 seconds.

Kinetics 700-2020
Countix
AVA Kinetics
Kinetics 700
Kinetics 600
Kinetics 400
TAP-Vid
Compressed Kinetics600
No items found.