In this chapter, we will introduce the Flickr distance (FD), which is
used to measure the visual correlation between concepts. The relationship between
concepts is a reflection of human perception, which is formed mainly based on the
human visual information. Thus mining the conceptual correlation from visual
information makes sense.
Flickr distance is calculated in two steps, concept modeling and concept distance
estimation. In the first step, each concept is assumed to have multiple states, such
as front views, side views, multiple semantics, etc., each of which is considered as a
latent topic. For each concept, a collection of related images are obtained from the
web, and then a latent topic visual language model (LTVLM) is built to capture
these states. In the second step, the distance between two concepts is estimated by
the Jensen-Shannon (JS) divergence between their LTVLM.
Different from traditional conceptual distance measurements, which are based
on Web text documents, FD is based on the visual information. Comparing with
the WordNet distance, FD can easily scale up with the increasing size of conceptual
corpus. Comparing with the Google distance (NGD) and Tag Concurrence Distance
(TCD), FD uses the visual information and can measure more kinds of conceptual
relations properly. We apply FD to multimedia related tasks and find FD is more
helpful than NGD. Based on FD, we also construct a large scale visual conceptual
network (VCNet) to store the knowledge of conceptual relationship. Experiments
show that FD is more coherent to human perception and can help boosting the
performance of several applications over the existing methods.
Keywords: Flickr distance, conceptual similarity, visual language model, distance measurement,
multimedia search, data mining, similarity search, image analysis, visual
similarity, image annotation, image tagging, image retrieval