Recent Developments in Artificial Intelligence and Communication Technologies

Increasing Performance of Boolean Retrieval Model by Data Parallelism Technique

Author(s): Mukesh Rawat*, Preksha Pratap, Manan Gupta and Hardik Sharma

Pp: 185-206 (22)

DOI: 10.2174/9781681089676122010012

* (Excluding Mailing and Handling)

Abstract

Information retrieval (IR) is to identify documents of non-uniform behavior that fulfill information requirements from the huge repository (maintained in computer systems). Different models have been defined to retrieve/fetch information. For example, the Boolean model, the Statistical model, which focuses on the vector space and probabilistic retrieval, and the Linguistic and Knowledge-based retrieval models. The Boolean model is defined as the “perfect match” model. If the queries are not accurate, they retrieve/fetch some irrelevant documents. This is called the precision (p) rate, which is the proportion of the relevant retrieved documents. The Boolean method provides good techniques to elaborate or concise a query. The Boolean method works well for the search process because of the clarity between the concepts. The Boolean retrieval model processes the queries in which terms of the queries are in the form of Boolean expressions, that is, in which terms of the user query combined with AND(&), OR(||), and NOT(!) operators. The model views documents in the form of inverted indexes. The key concept of an inverted index is to maintain a dictionary of terms. For every term, there is a collection of documents in which the term occurs. Posting is a collection of documents in which a term occurs. The list is known as the postings list (or inverted list), and all the postings lists are collectively called postings.

But as the number of documents is increased, the postings of documents are also increased, and processing these documents becomes time-consuming; so to resolve this problem, a multithreaded model is proposed in which the postings list is broken down into different chunks and processes, due to which Boolean operation between postings in accordance with Boolean query becomes faster. Using this data parallelism technique, the performance of the Boolean Retrieval Model is increased.


Keywords: Boolean retrieval, Inverted index, Postings, Posting list.

Related Journals
Related Books
© 2024 Bentham Science Publishers | Privacy Policy