This is a follow-up to yesterday’s post The cat and the musician.
(Image from Openclipart)
By pointing and saying “This is a cat”, or by having a computer vision system doing this labeling for us, we are merely asserting a name for one thing. Even if we continue doing this for multiple objects in a scene, we are still missing the relations between them (and between their names), we don’t know their story. Is the cat on a mat? Is there another cat? Does it also want a spot on the mat? What are they going to do about it? These and other important questions are still left unanswered by the way we, the Schaubies, are using computer systems at this point in time.
So how do the relations come in? For now, we have been using the creativity of our own pre-trained minds to generate descriptions and stories from images (and new images from these descriptions, and so on). The results from the Google Vision API provided good inspiration. But the question came up if it could be more interesting to utilise neural network systems for more than just generating writing prompts.
Originally we planned to use a text generation system to produce a narrative from the labels that were returned from the object recognition system. We might still try that at some point. However, by now I am thinking that this would be a mere replication or automation of the inspiration/writing-prompt approach. Even more than in our human-generated writings, I would expect such generated narrations to have no or only random resemblance to the relations that the recognised entities actually have in the original picture.
In preparing for this stay at Schaubude, we had some discussions about an intermediate step between recognition and narration, which we tentatively called “abstraction”, but also came to refer to as “association”. Somehow, we imagined, the computer system would take a step away from each recognition labels to another somehow related word: A more general term, a synonym, an association of some kind.
On the first glance, this would seem to just make the generated narration deviate further from the image put into the recognition system, and make their relation even more arbitrary. Yet I think we were on the brink of something crucial back then. For according to my reading of Eco’s A Theory of Semiotics1, associative relations between terms, disregarding any actual entities they may refer to, are what makes up a semiotic code: Every sign has got a so-called2 interpretant which clarifies its meaning, “which guarantees the validity of the sign” (68) by referring to the same content. Interpretants can be equivalents (like a pictogram of a cat as interpretant for the word “cat”), indexes (pointing at “this cat”), definitions, “emotive association[s]” (think of cats as symbols for independence, elegance or laziness) or translations into other languages.3
Of course it doesn’t stop there. Every interpretant can again be subjected to further clarification by another interpretant, creating a chain or a “string of signs” (71). (This is what Eco, following Peirce, calls “unlimited semiosis” (69ff), beautifully heralding it as the self-foundation of a semiotics of code.) And every expression can be joined with not only one, but countless different interpretants, thus creating not only a chain, but a network of signs, which secure one another’s meaning purely by their relationality.
I don’t know what this would mean for automated text generation, and my guess is that my thinking is about 50 years behind the state of the art. (Note to self: Look at WordNet.) I do feel, however, inspired to explore performative ways to make explicit the semiotic processes which follow that first step of recognition and labeling. To play and gamify different types of interpretants, maybe. But also to have another look at computer vision systems and find out how difficult it is to get first relations between entities already at the step of recognising them in the image. (If “the cat is on the mat”, a system should, after all, be able to infer this relation from the location of the entities “cat” and “mat” in a picture.)
Carlos, July 29, 2020
Umberto Eco, A Theory of Semiotics, 1977↩
The term “interpretant” originates from Peirce.↩
It is is never a material object or a raw perception. Eco is not quite clear on the latter (164ff), I think, and I would like to follow up by refreshing my memory on the “myth of the given” discussed in Wilfred Sellars’ Empiricism ant the Philosophy of Mind, and also look into Rebecca Kukla’s Myth, Memory and Misrecognition in Sellars’ “Empiricism and the Philosophy of Mind, which I haven’t read yet.↩