Observational studies of human infants tell us that they can successfully acquire lexicon and that the relationship between the meaning and the uttered word can be understood from only one teaching session by a caregiver, even though there are many other possible mappings. This paper proposes a lexical acquisition model that makes use of curiosity to associate visual features of observed objects with the labels that are uttered by a caregiver. A robot changes its attention and learning rate based on the curiosity. In experiments with a humanoid robot, the visual features are represented using self-organizing maps that adaptively represent the shape of the observed objects independent of viewpoint.