Home Research ICRA 2010 Reading CV

(Joint work with Jeffrey Mark Siskind and Andrei Barbu)

My research divides broadly into 3 categories
Hardware Game learning Grounding vision
Hardware
reach hand

Robotic platform

Our custom robotic platform is built around a novel two-surface wood housing, whose upper surface serves as a workspace. A 5-DOF manipulator arm is mounted on the upper housing surface while a 1-DOF pendulum arm is mounted on the lower housing surface. A head containing two cameras with individually controllable pan and tilt is mounted on the pendulum arm which allows the cameras to rotate 180 degrees around the workspace to image that space from a variety of viewpoints.

Endogenous sensors for fine motor control

A hand with two independently controllable fingers is mounted on the manipulator arm and contains a number of endogenous sensors in addition to the exogenous head-mounted cameras: a palm mounted camera, an ultrasonic range sensor, a computer-controllable laser pointer to assist depth estimation, and two independent force sensors on each finger, one on the inside surface and one on the fingertip. These endogenous sensors allow fine motor control through servoing.

Alternative content

Get Adobe Flash player

Alternative content

Get Adobe Flash player

top

Game learning

Learning to play Tic Tac Toe

A system that learns to play board games. Three independent and uncoupled agents, the protagonist, antagonist, and wannabe, timeshare the arm and cameras. The protagonist and antagonist play several games of Tic Tac Toe while the wannabe watches and learns the rules. The wannabe then plays against the protagonist with the learned rules.
Learned rules

Learning to play Hexapawn

The unmodified system can learn other games, such as Hexapawn. Variants on hexapawn that generate arbitrarily long games, with different captures and moving backwards can also be learned.
Learned rules

Alternative content

Get Adobe Flash player

Alternative content

Get Adobe Flash player

Learning to play Hexapawn variant D

The system learning to play an extension of Hexapawn augmented with sideways and backwards vertical non-capturing moves.
Learned rules

Learning Tic Tac Toe from English rules

Starting with rules described in English, the antagonist and protagonist play several games of Tic Tac Toe while the wannabe watches. Each views the board from a different position. The wannabe learns a logical description of the game, and then plays a game against the protagonist.

Alternative content

Get Adobe Flash player

Alternative content

Get Adobe Flash player

top

Grounding vision

Disassembly of a Lincoln Log structure

After the arm is calibrated, the pose and structure of the Lincoln Log structure are determined, then the structure, whose pose and structure have been determined solely from visual input, is disassembled.

Disassembly of a Lincoln Log structure from multiple views and linguistic input

After the arm is calibrated, the pose and structure of the Lincoln Log structure are determined, one view is insufficient, a second view is used and resolves the ambiguity, then the second view is forgotten and a linguistic constraint is applied. The structure determined solely from visual input, is then disassembled.

Alternative content

Get Adobe Flash player

Alternative content

Get Adobe Flash player

Lincoln Log structure estimation from a single image

Once we have determined the pose of a Lincoln Log assembly (left) using techniques from Web Figure 10, we can correctly determine the types and positions of the logs (shown in green) that constitute the assembly (right).

Structure estimation from spatially distinct views

Due to occlusion, a single view (top) may provide insufficient information to support correct structure estimation (the false negative shown in orange).
Integrating information from a second view (bottom) of the same structure, prior to disassembly, can correct the error.
(Correctly determined absence of logs is shown in blue.)

Web Figure 13: Structure estimation from temporally distinct views

Another way to recover occluded information is to begin the task of disassembly with partial information (top) and then reimage the structure from the same camera pose part-way through disassembly after the occlusion has been eliminated.
The information from two temporally distinct views of distinct assembly states can be integrated to yield a correct model of the initial structure (bottom).

Web Figure 14: Structure estimation given constraints

Occluded information can be recovered from a single image by constraining the space of possible structures (in this case specification of the piece inventory).
Our goal is for multiple agents to communicate such constraints linguistically and infer such constraints through high-level reasoning.

top