jeffrey heer >> blog >> acm multimedia

November 04, 2003

acm multimedia

Today I was at the 2003 ACM Multimedia conference presenting a demo of the Active Capture work I've been working on for the last year with the Garage Cinema Research group at Berkeley. Active Capture is a new approach to media acquisition that combines computer vision, audition, and human-computer interaction to enable automated film direction for automatically generated movies. The conference was held here in Berkeley, so we set up an entire mini-studio in a conference room of the Berkeley Marina Mariott, including a green screen, camera, computer, etc. We had a lot of great guests participate, including some prestigious folks known for inventing some useful things. Best of all, the system more or less behaved itself, successfully capturing and rendering the vast majority of participants, automatically turning them into the stars of an MCI commercial, a Godzilla movie scene, and a trailer for Terminator 2.

For more, check out our (admittedly cheesy) video.

For those interested, here's how it works at a high level... A participant stands in front of a green screen, where a computational unit consisting of a camera, microphone, display, and speakers directs the user through simple actions such as screaming into the camera or running in place. The system uses computer vision and audition to evaluate the performance and suggest corrections as necessary. The recorded result is then analyzed and placed into pre-existing templates to generate automatic movies. Pretty cool... and as this research progresses it should get even cooler and more useful. Applications of computer-led or computer-assisted direction include not only entertainment and media capture, but more pressing issues such as emergency evacuation services.

My particular interests are on the interesting HCI issues surrounding the mediation of human action by a computational system: how to design the directorial experience, what strategies to apply to not only guide human action, but to avoid and reduce error and misunderstanding. In the face of limited AI techniques (we use fairly simple techniques such as eye detection, motion detection, timing, and audio volume), well-designed interaction is essential to providing the necessary context fpr recognition systems as well as creating an enjoyable, engaging experience for the user.

Posted by jheer at November 4, 2003 08:02 PM

Comments

Trackback Pings

Trackback URL

TrackBack URL for this entry:
http://jheer.org/cgi-bin/MT-2.64/mt-tb.cgi/131