Alessandro Sambini & Gilberto Decaro

An unfinished conversation on robot vision

Gilberto Decaro, portrait by Alessandro Sambini & Giorgina, 2016

Gilberto Decaro is a robot programmer. I decided to ask him some questions about “vision” and the idea of “looking”, trying to figure out how these terms may adapt in the world of robots. Instead of offering you a finished text, I am proposing a conversation under construction. I will certainly have other questions to ask him and I am not going to close this issue in these few lines.

A – “To look” derives from the German word wardōn which means “to look” whilst incorporating the acceptation of “being in alert”. To see is a human innate talent, intrinsic within the optical apparatus: we open our eyes and we just see. Conversely, when we observe or scrutinise, we turn on a higher level of attention toward objects, people or scenarios (or fragments of them), which seem somehow to deserve our gaze. How do these characteristics adapt to the world of robots?

G – These characteristics presuppose the presence of an advanced vision, partly over-exploited. Our visual apparatus provides images through a powerful tool like the eyes, but the information that these images generate is significantly superior to the data contained within themselves, thanks to both our a-priori knowledge, which enriches the data with infinite additional associations, and to the imprecise yet very effective calculations of our brain; balance, for example, is largely affected by the sight. At various levels all of these features are present in the act of viewing. The robot has no visual skills and processes the received images with the sole purpose of solving the problem for which it was created; consequently, it has a highly specialised vision that in some cases may be superior to the human’s one (i.e. recognizing the slightest movement, recognizing infinitesimal imperfections, or seeing in the dark), but suffers from a deficit in perception.

A – Let’s think about the images which are embedded in our memory, how does a robot keep, record, omit, discard or forget the visual material is currently looking at?

G – It heavily depends on the type of vision implemented. Beyond the recording capability of some surveillance robots – a capability which, although very evolved, is comparable to that of a video-recorder – some techniques such as neural networks applied to vision learn the relevant information directly from images. They are trained with large quantities of images to recognise certain aspects such as objects or faces and although they have no memory of the real figures, they learn the characteristics to be sought in order to recognise them. It is not a real memory, but a logical structure that represents, in a sense, the experiences that have been verified.

A – Who does decide this, the programmer or the robot?

G – It is the programmer who builds the perception capabilities of the robot and its behaviour based on the stimuli received, also through self-learning techniques that make the robot more independent from the programmer’s hand.

A – When we see something that suddenly changes its trajectory, we feel alarmed.

A falling person, an accelerating object, an unexpected movement. Sudden changes in direction, accompanied by acceleration, can frighten us and force us to immediately look away: we are afraid of seeing discrepancies with what we are used to see. How can we associate the idea of scare to robots? When do they feel the need to look away?

G – Fear is a primordial, instinctive feeling, closely linked to the survival of a living being; I do not think there are robots that implement such feeling, perhaps because of our vision of robots as “machines for doing a job”: it’s not very useful for a car to be afraid. Paradoxically, in a state of fear, for instance when detecting a fire or an intruder, the robot would be pushed to observe the scene with insistence instead of looking away. Incongruence in vision is the result of the scene contextualisation. Present-day robots contextualise very little, as this is a particularly complex task, and therefore hardly find inconsistencies during their observation. A robot could detect an inconsistency in the image at the same time as not detecting anything abnormal though there is, because of its inability to contextualise the scene; in both cases it would probably tend to observe the object and its look would not be diverted.