The linguistic description of a physical phenomenon is a summary of the available information where certain relevant aspects are remarked while other irrelevant aspects remain hidden. This paper deals with the development of computational systems capable to generate linguistic descriptions from images captured by a video camera. The problem of linguistically labeling images in a database is a challenge where still much work remains to be done. In this paper, we contribute to this field using a model of the observed phenomenon that allows us to interpret the content of images. We build the model by combining techniques from Computer Vision with ideas from the Zadeh’s Computational Theory of Perceptions. We include a practical application consisting of a computational system capable to provide a linguistic description of the behavior of traffic in a roundabout.