In this chapter, we briefly review the online learning algorithms applied
to enable content-based multimedia annotation, which is scalable to handle
large-scale multimedia data as well as the associated semantic concepts. Multimedia
search uses annotated semantic concepts to approach efficient content-based
indexing. This is a promising direction to enable real content-based multimedia
search. However, due to large amounts of multimedia samples and semantic concepts,
existing techniques for automatic multimedia annotation are not able to handle
large-scale multimedia corpus and concept set, in terms of both annotation
accuracy and computation cost. To enable large-scale semantic concept annotation,
a practical multimedia annotation method ought to be scalable on both multimedia
sample dimension and concept label dimension. In real-world cases, large-scale
unlabeled multimedia samples arrive consecutively in batches with an initial prelabeled
training set, based on which a preliminary multi-label classifier is built. For
each arrived batch, a multi-label active learning engine is applied, which selects a
set of unlabeled samples with selected set of labels to get label confirmation from
data labelers. And then an online learner updates the original classifier by taking
the newly labeled sample-label pairs into consideration. This process repeats until
all data are arrived. During the process, new labels, even without any pre-labeled
training samples, can be incorporated into the process anytime. In this chapter,
we review the large-scale online active annotation for Internet multimedia in the
above two basic techniques - active learning and online computing. By combining
these two techniques in a unified framework, scalable multimedia annotation can be
achieved in an online manner so that both annotation accuracy and efficiency are
able to be significantly improved.
Keywords: large-scale multimedia search and mining, online learning, multi-label annotation,
multimedia sample dimension, concept label dimension, sample-label pair,
multi-label active learning, large-scale online active annotation, Correlative Multi-
Label, 2D Active Learning