Recognizing kitchen tools and ingredients from shifting, shaky angles.
Understanding the logical sequence of steps required to complete a complex task. Usage in AI Benchmarking g4_01136.mp4
🎥 This video is often cited in papers involving or Transformers designed for video understanding. It serves as a "real-world" challenge because of motion blur, hand occlusions, and the visual complexity of a cluttered kitchen. g4_01136.mp4