Oracle Corporation

04/09/2024 | Press release | Distributed by Public on 04/10/2024 17:40

How to build a RAG solution using OCI Generative AI

Current search solutions often don't answer user questions. Instead, they return relevant links. It works well for the popular search engines, but not all data is accessible to these search engines. For example, enterprises have a lot of internal data and sometimes a poor search mechanism.

We believe that search can be improved. Imagine a world where instead of using search hacks, users can frame their questions naturally like they're talking to a system and get precise answers to their questions. Imagine a world where a newly hired employee doesn't have to navigate their way through multiple internal documents to get answer to their basic questions. They can just ask a bot and get answers!

With the explosion of information around us and less free time on our hands, enterprises need a solution that users can interact with using natural language, and the solution needs to respond with precise answers to their questions. They can implement this solution with the recent advancements in natural language processing (NLP) and cloud technologies. Even better, Oracle Cloud Infrastructure (OCI) has the tools and managed services that can help you develop such solutions.

The core of these impressive NLP solutions is a mechanism called retrieval-augmented generation (RAG). With RAG, for a user's questions, we retrieve the pieces of information that contain an answer to that question, then supply the question and retrieved texts to a large language model (LLM) to augment the LLM's response and reduce hallucination.

While OCI offers RAG as a managed service with OCI Generative AI Agents, the service is still in beta and only supports OpenSearch as the knowledge base repository. In this blog post, we show how to build a custom RAG solution with an Oracle database with vector support.

To build our RAG solution, we divide the code in two parts: Creating and querying the knowledge base.

Prerequisites

You need access to an Oracle database with support for vector data type. You also need to connect to it using Python oracledb library. We use Oracle's vector database offering, which is available in Oracle Database 23c.

Creating a knowledge base

Let's start by creating a knowledge base. Create a Python file called, "create_knowledge_base.py."

Setting up OCI config

To use OCI services, you need to get access to OCI and its Generative AI service. You need the following information to use OCI Generative AI:

  • Compartment ID
  • Config profile on your machine (usually stored at the path, ~/.oci)

When you have these details, you can use them in the code. You must also pip install the following Python modules:

from unstructured.partition.html import partition_html

from unstructured.chunking.title import chunk_by_title

from unstructured.cleaners.core import clean

import oci

import oracledb


# Constants

org = "oracle"

compartment_id = ""

CONFIG_PROFILE = ""

config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE)


# Service endpoint

endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"

generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config, service_endpoint=endpoint, retry_strategy=oci.retry.NoneRetryStrategy(), timeout=(10,240))


headers = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47"}


Next, write the functions to interact with the database to create a table and prepare the data.

# Functions to interact with DB

def create_db_connection():

    connection = oracledb.connect(user="", password="",dsn="")

    return connection


def create_table(cursor, org):

    try:

        cursor.execute("""

        begin

            execute immediate 'drop table {test}_web_embeddings';

            exception when others then if sqlcode <> -942 then raise; end if;

        end;""".format(test = org))

   

    except:

        print("Error dropping table for {org}".format(org=org))

   

    try:

        cursor.execute("""

            create table {org}_web_embeddings (

                id number,

                content varchar2(4000),

                vec vector(1024, float32),

                source_url varchar2(4000),

                primary key (id)

            )""".format(org=org))

    except:

        print("Error while creating table for {org}".format(org=org))


def insert_data(cursor,id, chunk, vec, source_url, org):

    cursor.setinputsizes(None,4000, oracledb.DB_TYPE_VECTOR, 4000)

    try:  

        cursor.execute("insert into {org}_web_embeddings values (:1, :2, :3, :4)".format(org=org), [id,chunk,vec, source_url])

    except:

        print("Error while inserting in DB")

Data preparation

To build a question answering system, create a knowledge base. Just like we might refer to books to find an answer, the solution refers to the knowledge base.

The knowledge base can be anything: A website, a pdf or even a word doc. But any knowledge base must be chunked because we don't want to group everything together and pass meaningful chunks of text to the LLM during question answering. We can use the following code for chunking:

# Data Preperation - Chunking

def parse_and_chunk_url_text(source_url):

    formatted_url = source_url.strip()

    chunks = []

    try:

        elements = partition_html(url=formatted_url, headers=headers, skip_headers_and_footers=True)

    except:

        print("Error while attempting to crawl {site}".format(site=formatted_url))

    else:

        chunks = chunk_by_title(elements)

    finally:

        return chunks

When you have chunked data, use it to create text embeddings and store in the database. We have a new term here: Embeddings.

Introducing text embeddings

Think about a textbook: Every chapter has some questions at the end. When answering the questions at the end of a chapter, we refer to that chapter and not others because we know that's where we will find the context to answer the question. Similarly, if we want LLMs to generate precise answers, we need to provide them with the right context.

We provide relevant context to LLMs by finding the question's embeddings. Embeddings are numerical representation of textual data, which preserve its semantic meaning.

We generate text embeddings with OCI and Cohere's text embedding model. We can use the following code to generate and store embeddings for the data:

# Data Preperation - Embedding

def create_knowledge_base_from_client_content(org, contents):

    connection = create_db_connection()

    cursor = connection.cursor()

   

    create_table(cursor=cursor, org = org)


    print("creating embeddings for {org}".format(org=org))

    len_of_contents = len(contents)


    print("len of contents is ", len_of_contents)

   

    start = 0

    cursor_index = 0

    while start < len_of_contents:

        embed_text_detail = oci.generative_ai_inference.models.EmbedTextDetails()

        content_subsets = contents[start:start+96]

        inputs = []

        for subset in content_subsets:

            if subset:

                inputs.append(subset)

        embed_text_detail.inputs = inputs

        embed_text_detail.truncate = embed_text_detail.TRUNCATE_END

        embed_text_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.embed-english-v3.0")

        embed_text_detail.compartment_id = compartment_id

        embed_text_detail.input_type = embed_text_detail.INPUT_TYPE_SEARCH_DOCUMENT

        try:

            embed_text_response = generative_ai_inference_client.embed_text(embed_text_detail)

        except Exception as e:

            print("Error while creating embeddings ", e)

            embeddings = []

        else:

            embeddings = embed_text_response.data.embeddings

       

        for i in range(len(inputs)):

            insert_data(cursor, cursor_index, inputs[i], list(embeddings[i]), "https://en.wikipedia.org/wiki/Oracle_Corporation", org)

            cursor_index = cursor_index + 1


        start = start + 96

       

    connection.commit()

    cursor.close()

    connection.close()

We had to break down the text contents in chunks of 96 because that's the maximum content length that OCI models support in one call.

Now, create a table to store organization specific data. So, queries for one organization don't search data in a different organization's table. This process is done in the functions, "create_table" and "insert_data," in the previous code by specifying the organization specific table to create and insert data into.

We can complete this file by adding a main function and creating a knowledge base using Oracle's wiki entry.

# Main Function

if __name__ == '__main__':

    # get chunked text

    organized_content = parse_and_chunk_url_text('https://en.wikipedia.org/wiki/Oracle_Corporation')


    # clean data

    contents = []

    for chunk in organized_content:

        text = chunk.text

        text = clean(text, extra_whitespace=True)

        contents.append(text)


    # prepare knowledge base

    create_knowledge_base_from_client_content(org, contents)

Next, create two separate files because we want to create the knowledge base once but query it multiple times. So, every new query shouldn't result in the creation of knowledge base again.

For the next step, we query the knowledge base.

Querying the knowledge base

In this section, we query the knowledge we created and implement the RAG mechanism.

Because we're asking questions about our knowledge base to the LLM using RAG, we need to provide the LLM with text that might contain an answer to the question.

How do we find such texts? Here, embeddings help. We find relevant text by comparing embedding similarities using dot product metrics. The higher the dot product between user's question embedding and a text embedding in Oracle Database, the more similar they are and the more likely that the LLM can use it to answer the user's question.

The following diagram shows the core concept of a RAG solution and how a dot product of two vectors is used to calculate vector similarity.

Theoretically, you can view the dot product similarity of two vectors as multiplying the size of each vector by the cosine of their angle. Incidentally, this calculation is equivalent to the sum of the products of each vector's coordinate. Because we often don't have access to the cosine of the two vector's angle, so this calculation becomes easier.

Let's start by creating a python file called, "query_knowledge_base.py." Now, define some constants and import required modules.

import cohere

import oci

import oracledb


# Constants

org = "oracle"

compartment_id = ""

CONFIG_PROFILE = ""

config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE)


# Service endpoint

endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"

generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config, service_endpoint=endpoint, retry_strategy=oci.retry.NoneRetryStrategy(), timeout=(10,240))


cohere_client = cohere.Client('')

Next, define how to perform dot product similarity between two vectors-in our example, the user's query vector and text vector stored in the database.

# Used to format response and return references

class Document():

      doc_id: int

      doc_text: str

      url: str

     

      def __init__(self, id, text, url) -> None:

            self.doc_id = id

            self.doc_text = text

            self.url = url


# Find relevant records from Oracle Vector DB using Dot Product similarity.

def search_data(cursor, query_vec, list_dict_docs, org):

    relevant_docs = []

    cursor.setinputsizes(oracledb.DB_TYPE_VECTOR)

    cursor.execute( """

        select *

        from {org}_web_embeddings

        order by vector_distance(vec, :1, DOT)

        fetch first 10 rows only

    """.format(org=org),query_vec)

   

    for row in cursor:

        id = row[0]

        text = row[1]

        url = row[3]

        temp_dict = {id:text}

        list_dict_docs.append(temp_dict)

        doc = Document(id, text, url)

        relevant_docs.append(doc)

   

    return relevant_docs

In the function, we calculate the dot product between user's query vector and the text embeddings available in Oracle Database. This connection helps us find the relevant document, which might contain an answer to user's query. Oracle Database supports this functionality natively, which means you don't need to install any third-party tools to perform the mathematical calculations needed for dot product.

Now, we can define the function to interact with the OCI LLM.

# OCI-LLM: Used to generate embeddings for question(s)

def generate_embeddings_for_question(question_list):

    embed_text_detail = oci.generative_ai_inference.models.EmbedTextDetails()

    embed_text_detail.inputs = question_list

    embed_text_detail.input_type = embed_text_detail.INPUT_TYPE_SEARCH_QUERY

    embed_text_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.embed-english-v3.0")

    embed_text_detail.compartment_id = compartment_id

    embed_text_response = generative_ai_inference_client.embed_text(embed_text_detail)

    return embed_text_response


# OCI-LLM: Used to prompt the LLM

def query_llm_with_prompt(prompt):

    cohere_generate_text_request = oci.generative_ai_inference.models.CohereLlmInferenceRequest()

    cohere_generate_text_request.prompt = prompt

    cohere_generate_text_request.is_stream = False

    cohere_generate_text_request.max_tokens = 1000

    cohere_generate_text_request.temperature = 0

    cohere_generate_text_request.top_k = 0

    cohere_generate_text_request.top_p = 0


    generate_text_detail = oci.generative_ai_inference.models.GenerateTextDetails()

    generate_text_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.command")

    generate_text_detail.compartment_id = compartment_id

    generate_text_detail.inference_request = cohere_generate_text_request


    generate_text_response = generative_ai_inference_client.generate_text(generate_text_detail)

    llm_response_result = generate_text_response.data.inference_response.generated_texts[0].text

   

    return llm_response_result

Now, we're ready to perform and implement RAG:

# Perform RAG

def answer_user_question(org, query):

    question_list = []

    question_list.append(query)

   

    embed_text_response = generate_embeddings_for_question(question_list)


    question_vector = embed_text_response.data.embeddings[0]


    with oracledb.connect( user="vector", password="vector", dsn="freepdb1") as db:

        cursor = db.cursor()

        list_dict_docs = []

        #query vector db to search relevant records

        similar_docs = search_data(cursor, [question_vector], list_dict_docs, org=org)

       

        rerank_docs = []

        for docs in similar_docs:

            content = str(docs.doc_id) + ": " + docs.doc_text

            rerank_docs.append(content)

       

        #use cohere reranker to fetch top documents.

        rerank_results = cohere_client.rerank(query=query, documents=rerank_docs, top_n=5, model='rerank-english-v2.0')

        #print(rerank_results)

       

        #prepare documents for the prompt

        context_documents = []

        relevant_doc_ids = []

        similar_docs_subset=[]

       

        for rerank_result in rerank_results:

            doc_data = rerank_result.document['text']

            context_documents.append(doc_data)

            relevant_doc_ids.append(doc_data.split(":")[0])

       

        for docs in similar_docs:

            current_id = str(docs.doc_id)

            if current_id in relevant_doc_ids:

                similar_docs_subset.append(docs)

       


        context_document = "\n".join(context_documents)


        prompt_template = '''

        Text: {documents} \n

        Question: {question} \n

        Answer the question based on the text provided and also return the relevant document numbers where you found the answer. If the text doesn't contain the answer, reply that the answer is not available.

        '''

       

        prompt = prompt_template.format(question = query, documents = context_document)

        print(prompt)

       

        llm_response_result = query_llm_with_prompt(prompt)

        response = {}

        response['message'] = query

        response['text'] = llm_response_result

        response['documents'] = [{'id': doc.doc_id, 'snippet': doc.doc_text, 'url': doc.url } for doc in similar_docs_subset]


        return response

The code performs the following tasks:

  • Generates embeddings for the question.
  • Queries the Oracle vector database to find top 10 relevant documents using dot product similarity.
  • Reranks the documents using Cohere Reranker and filter out top five most relevant documents that can contain answer to the question.
  • Performs RAG: We augment the LLMs answer generation by providing appropriate context in form of relevant documents queried and filtered earlier.
  • Prompting: We prompt the LLMs with instruction on what we are expecting from it.
  • Returns the LLM response.

Let's wrap up this file by adding a main function and asking our question!

if __name__ == '__main__':

    #Ask your question

    query = "Who founded Oracle?"

    print(answer_user_question(org, query)['text'])

If you have followed all the steps correctly, you get a response like the following output:

Conclusion

There you have it. That's how you can talk to your data using Oracle tools. We used a single URL as a data source, but you can extend it to any source containing textual information, such as PDFs, word docs, and company internal wikis.

Hopefully, this tutorial gave you enough information about the new frontier of AI. We can't wait to hear about the amazing solutions you build using Oracle Cloud Infrastructure 's AI tools!

Resources

Get started with Oracle Cloud Infrastructure basics

What Is Retrieval-Augmented Generation (RAG)?

Press Release: Oracle Introduces Integrated Vector Database