Splunk Inc.

03/27/2024 | News release | Distributed by Public on 03/28/2024 03:55

Information Retrieval & Intelligence: How It Works for AI

Information Retrieval (IR) is the process of accessing information systems to satisfy an information need.

In the context of machine learning, the term "information needs" refers to the requirements of:

  • Explaining an observed phenomenon.

  • Understanding how information systems are being used.

  • Controlling, improving, and manipulating the utilization of information systems.

In practice, Information Retrieval tasks involve the tasks of identification and retrieval of information resources from a storage system. (Information systems, of course, can refer to any way of collect and transmit data, or digital information. Here, we're mostly talking in terms of databases and AI.)

Using machines for information retrieval

The idea of using machines for relevant information to satisfy an information need was first proposed in by Vannevar Bush in 1945, in his influential research essay As We May Think. The author proposed a mechanized system that can store all kinds of information and access them with exceeding speed and flexibility.

(Image source)


Such a hypothetical system could extend our mental capacity, and - while not necessarily duplicating the mental process itself - and enable a process that he referred to as "selection by association, rather than by indexing".

This idea serves as a basis for modern Information Retrieval systems, considering that retrieving information is not limited to indexing and querying a stored object in the database.

Use cases for information retrieval

Information Retrieval can be categorized in terms of four key use retrieval use cases to satisfy an information need.

Reference retrieval

If "reference retrieval" reminds you of university, you're not alone. Here, reference retrieval refers to the search or retrieval of something - a document, abstract or reference - that may contain information relevant to a search query.

The information resource may supplement the search process by guiding a user to a resource that most accurately satisfies the search question.

Fact retrieval

Here, it is the retrieval of the information itself that satisfies the intended search query. The fact may be:

  • Text embedded in a document

  • A media file in a database

  • Raw data in a dataset collection


The retrieval may completely or partially satisfy the search query.

Question-Answering

Question-answering is the process of inferring knowledge from an information resource. The retrieved information may not be considered as a knowledge fact to answer a question, but it supports knowledge inference from the material presented as information.

Data retrieval

Here, "data retrieval" refers to unstructured information about an individual or several related items extracted from an information resource. Data may be either:


Challenges with AI and ML

In the context of AI and machine learning, these distinctions suggest varying levels of intelligence required - to identify knowledge dependencies and relevance in information, extract data from information systems and relate them to the search intent of a user.

The role of AI is particularly suitable for IR queries that involve question-answering. Traditional index-based search mechanisms may suffice for the retrieval of:

  • References

  • Facts

  • Data-related queries


Techniques such as a structured index-based search mechanism that extracts metadata or keywords from information systems may be inefficient for Information Retrieval in Big Data assets.

AI techniques that can reduce the search time and computation requirements to accurately satisfy inference based information retrieval - such as question-answering, as well as retrieval of static information from large volumes of data, documents, media, logs and other unstructured and semi-structured information systems - are widely adopted today.

AI methods for information retrieval

So what are some of the recent AI methods for Information Retrieval?

Algebraic models

These are the mathematical frameworks that provide structured relationships between query and language instances in the context of Information Retrieval.

A popular example is the Vector Space models that represent text vocabulary as queries in a high-dimensional space and rank documents based on a notion of similarity. The relevance of a document is determined by simple algebraic calculation of cosine similarity of its text with the search query.

Probabilistic models

These are mathematical models that view search and retrieval as a probabilistic decision-making process. These models typically evaluate the statistical properties of the information resource and the search query. Some common examples include:

  • Bayesian Inference to rank dependencies between variables

  • Search queries found in a document


For example, a document may contain several instances of the search query. The model infers the probability of relevance of the document to the query based on the observed evidence.

Neural network models

Most modern AI models for Information Retrieval represent complex data patterns and relationships in the text using Neural Networks.

In machine learning, a neural network is a set of interconnected nodes represented by a set of equations. The parameters of the set of equations is updated according to (minimizing) a cost function such as:

  • Mean Square Error (MSE)

  • Mean Absolute Error (MAE)

  • Some error based objective function that can accurately map relationships between the input data and the output data (labels or classes)


This simple concept underpins major advances in Information Retrieval, and Artificial Intelligence in general, including probabilistic generative models, reinforcement learning, LLMs, diffusion models and more!

AI for information retrieval

Modern AI tools for Information Retrieval are used to supplement human capacity of memory and search, certainly. These tools also enable cognitive abilities that broaden the scope of search and retrieval: while a user simply searches for a few query phrases, Information Retrieval systems can infer search context and use intelligence to guide search.

Retrieval is improved by using AI algorithms to efficiently search across large information assets. Intelligent search and efficient retrieval therefore forms the basis of modern Information Retrieval systems in AI and ML.