04/09/2024 | Press release | Distributed by Public on 04/10/2024 17:40
Current search solutions often don't answer user questions. Instead, they return relevant links. It works well for the popular search engines, but not all data is accessible to these search engines. For example, enterprises have a lot of internal data and sometimes a poor search mechanism.
We believe that search can be improved. Imagine a world where instead of using search hacks, users can frame their questions naturally like they're talking to a system and get precise answers to their questions. Imagine a world where a newly hired employee doesn't have to navigate their way through multiple internal documents to get answer to their basic questions. They can just ask a bot and get answers!
With the explosion of information around us and less free time on our hands, enterprises need a solution that users can interact with using natural language, and the solution needs to respond with precise answers to their questions. They can implement this solution with the recent advancements in natural language processing (NLP) and cloud technologies. Even better, Oracle Cloud Infrastructure (OCI) has the tools and managed services that can help you develop such solutions.
The core of these impressive NLP solutions is a mechanism called retrieval-augmented generation (RAG). With RAG, for a user's questions, we retrieve the pieces of information that contain an answer to that question, then supply the question and retrieved texts to a large language model (LLM) to augment the LLM's response and reduce hallucination.
While OCI offers RAG as a managed service with OCI Generative AI Agents, the service is still in beta and only supports OpenSearch as the knowledge base repository. In this blog post, we show how to build a custom RAG solution with an Oracle database with vector support.
To build our RAG solution, we divide the code in two parts: Creating and querying the knowledge base.
You need access to an Oracle database with support for vector data type. You also need to connect to it using Python oracledb library. We use Oracle's vector database offering, which is available in Oracle Database 23c.
Let's start by creating a knowledge base. Create a Python file called, "create_knowledge_base.py."
To use OCI services, you need to get access to OCI and its Generative AI service. You need the following information to use OCI Generative AI:
When you have these details, you can use them in the code. You must also pip install the following Python modules:
from unstructured.partition.html import partition_html from unstructured.chunking.title import chunk_by_title from unstructured.cleaners.core import clean import oci import oracledb # Constants org = "oracle" compartment_id = "" CONFIG_PROFILE = "" config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE) # Service endpoint endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com" generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config, service_endpoint=endpoint, retry_strategy=oci.retry.NoneRetryStrategy(), timeout=(10,240)) headers = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47"}
Next, write the functions to interact with the database to create a table and prepare the data.
# Functions to interact with DB def create_db_connection(): connection = oracledb.connect(user="", password="",dsn="") return connection def create_table(cursor, org): try: cursor.execute(""" begin execute immediate 'drop table {test}_web_embeddings'; exception when others then if sqlcode <> -942 then raise; end if; end;""".format(test = org)) except: print("Error dropping table for {org}".format(org=org)) try: cursor.execute(""" create table {org}_web_embeddings ( id number, content varchar2(4000), vec vector(1024, float32), source_url varchar2(4000), primary key (id) )""".format(org=org)) except: print("Error while creating table for {org}".format(org=org)) def insert_data(cursor,id, chunk, vec, source_url, org): cursor.setinputsizes(None,4000, oracledb.DB_TYPE_VECTOR, 4000) try: cursor.execute("insert into {org}_web_embeddings values (:1, :2, :3, :4)".format(org=org), [id,chunk,vec, source_url]) except: print("Error while inserting in DB")
To build a question answering system, create a knowledge base. Just like we might refer to books to find an answer, the solution refers to the knowledge base.
The knowledge base can be anything: A website, a pdf or even a word doc. But any knowledge base must be chunked because we don't want to group everything together and pass meaningful chunks of text to the LLM during question answering. We can use the following code for chunking:
# Data Preperation - Chunking def parse_and_chunk_url_text(source_url): formatted_url = source_url.strip() chunks = [] try: elements = partition_html(url=formatted_url, headers=headers, skip_headers_and_footers=True) except: print("Error while attempting to crawl {site}".format(site=formatted_url)) else: chunks = chunk_by_title(elements) finally: return chunks
When you have chunked data, use it to create text embeddings and store in the database. We have a new term here: Embeddings.
Think about a textbook: Every chapter has some questions at the end. When answering the questions at the end of a chapter, we refer to that chapter and not others because we know that's where we will find the context to answer the question. Similarly, if we want LLMs to generate precise answers, we need to provide them with the right context.
We provide relevant context to LLMs by finding the question's embeddings. Embeddings are numerical representation of textual data, which preserve its semantic meaning.
We generate text embeddings with OCI and Cohere's text embedding model. We can use the following code to generate and store embeddings for the data:
# Data Preperation - Embedding def create_knowledge_base_from_client_content(org, contents): connection = create_db_connection() cursor = connection.cursor() create_table(cursor=cursor, org = org) print("creating embeddings for {org}".format(org=org)) len_of_contents = len(contents) print("len of contents is ", len_of_contents) start = 0 cursor_index = 0 while start < len_of_contents: embed_text_detail = oci.generative_ai_inference.models.EmbedTextDetails() content_subsets = contents[start:start+96] inputs = [] for subset in content_subsets: if subset: inputs.append(subset) embed_text_detail.inputs = inputs embed_text_detail.truncate = embed_text_detail.TRUNCATE_END embed_text_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.embed-english-v3.0") embed_text_detail.compartment_id = compartment_id embed_text_detail.input_type = embed_text_detail.INPUT_TYPE_SEARCH_DOCUMENT try: embed_text_response = generative_ai_inference_client.embed_text(embed_text_detail) except Exception as e: print("Error while creating embeddings ", e) embeddings = [] else: embeddings = embed_text_response.data.embeddings for i in range(len(inputs)): insert_data(cursor, cursor_index, inputs[i], list(embeddings[i]), "https://en.wikipedia.org/wiki/Oracle_Corporation", org) cursor_index = cursor_index + 1 start = start + 96 connection.commit() cursor.close() connection.close()
We had to break down the text contents in chunks of 96 because that's the maximum content length that OCI models support in one call.
Now, create a table to store organization specific data. So, queries for one organization don't search data in a different organization's table. This process is done in the functions, "create_table" and "insert_data," in the previous code by specifying the organization specific table to create and insert data into.
We can complete this file by adding a main function and creating a knowledge base using Oracle's wiki entry.
# Main Function if __name__ == '__main__': # get chunked text organized_content = parse_and_chunk_url_text('https://en.wikipedia.org/wiki/Oracle_Corporation') # clean data contents = [] for chunk in organized_content: text = chunk.text text = clean(text, extra_whitespace=True) contents.append(text) # prepare knowledge base create_knowledge_base_from_client_content(org, contents)
Next, create two separate files because we want to create the knowledge base once but query it multiple times. So, every new query shouldn't result in the creation of knowledge base again.
For the next step, we query the knowledge base.
In this section, we query the knowledge we created and implement the RAG mechanism.
Because we're asking questions about our knowledge base to the LLM using RAG, we need to provide the LLM with text that might contain an answer to the question.
How do we find such texts? Here, embeddings help. We find relevant text by comparing embedding similarities using dot product metrics. The higher the dot product between user's question embedding and a text embedding in Oracle Database, the more similar they are and the more likely that the LLM can use it to answer the user's question.
The following diagram shows the core concept of a RAG solution and how a dot product of two vectors is used to calculate vector similarity.
Theoretically, you can view the dot product similarity of two vectors as multiplying the size of each vector by the cosine of their angle. Incidentally, this calculation is equivalent to the sum of the products of each vector's coordinate. Because we often don't have access to the cosine of the two vector's angle, so this calculation becomes easier.
Let's start by creating a python file called, "query_knowledge_base.py." Now, define some constants and import required modules.
import cohere import oci import oracledb # Constants org = "oracle" compartment_id = "" CONFIG_PROFILE = "" config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE) # Service endpoint endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com" generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config, service_endpoint=endpoint, retry_strategy=oci.retry.NoneRetryStrategy(), timeout=(10,240)) cohere_client = cohere.Client('')
Next, define how to perform dot product similarity between two vectors-in our example, the user's query vector and text vector stored in the database.
# Used to format response and return references class Document(): doc_id: int doc_text: str url: str def __init__(self, id, text, url) -> None: self.doc_id = id self.doc_text = text self.url = url # Find relevant records from Oracle Vector DB using Dot Product similarity. def search_data(cursor, query_vec, list_dict_docs, org): relevant_docs = [] cursor.setinputsizes(oracledb.DB_TYPE_VECTOR) cursor.execute( """ select * from {org}_web_embeddings order by vector_distance(vec, :1, DOT) fetch first 10 rows only """.format(org=org),query_vec) for row in cursor: id = row[0] text = row[1] url = row[3] temp_dict = {id:text} list_dict_docs.append(temp_dict) doc = Document(id, text, url) relevant_docs.append(doc) return relevant_docs
In the function, we calculate the dot product between user's query vector and the text embeddings available in Oracle Database. This connection helps us find the relevant document, which might contain an answer to user's query. Oracle Database supports this functionality natively, which means you don't need to install any third-party tools to perform the mathematical calculations needed for dot product.
Now, we can define the function to interact with the OCI LLM.
# OCI-LLM: Used to generate embeddings for question(s) def generate_embeddings_for_question(question_list): embed_text_detail = oci.generative_ai_inference.models.EmbedTextDetails() embed_text_detail.inputs = question_list embed_text_detail.input_type = embed_text_detail.INPUT_TYPE_SEARCH_QUERY embed_text_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.embed-english-v3.0") embed_text_detail.compartment_id = compartment_id embed_text_response = generative_ai_inference_client.embed_text(embed_text_detail) return embed_text_response # OCI-LLM: Used to prompt the LLM def query_llm_with_prompt(prompt): cohere_generate_text_request = oci.generative_ai_inference.models.CohereLlmInferenceRequest() cohere_generate_text_request.prompt = prompt cohere_generate_text_request.is_stream = False cohere_generate_text_request.max_tokens = 1000 cohere_generate_text_request.temperature = 0 cohere_generate_text_request.top_k = 0 cohere_generate_text_request.top_p = 0 generate_text_detail = oci.generative_ai_inference.models.GenerateTextDetails() generate_text_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.command") generate_text_detail.compartment_id = compartment_id generate_text_detail.inference_request = cohere_generate_text_request generate_text_response = generative_ai_inference_client.generate_text(generate_text_detail) llm_response_result = generate_text_response.data.inference_response.generated_texts[0].text return llm_response_result
Now, we're ready to perform and implement RAG:
# Perform RAG def answer_user_question(org, query): question_list = [] question_list.append(query) embed_text_response = generate_embeddings_for_question(question_list) question_vector = embed_text_response.data.embeddings[0] with oracledb.connect( user="vector", password="vector", dsn="freepdb1") as db: cursor = db.cursor() list_dict_docs = [] #query vector db to search relevant records similar_docs = search_data(cursor, [question_vector], list_dict_docs, org=org) rerank_docs = [] for docs in similar_docs: content = str(docs.doc_id) + ": " + docs.doc_text rerank_docs.append(content) #use cohere reranker to fetch top documents. rerank_results = cohere_client.rerank(query=query, documents=rerank_docs, top_n=5, model='rerank-english-v2.0') #print(rerank_results) #prepare documents for the prompt context_documents = [] relevant_doc_ids = [] similar_docs_subset=[] for rerank_result in rerank_results: doc_data = rerank_result.document['text'] context_documents.append(doc_data) relevant_doc_ids.append(doc_data.split(":")[0]) for docs in similar_docs: current_id = str(docs.doc_id) if current_id in relevant_doc_ids: similar_docs_subset.append(docs) context_document = "\n".join(context_documents) prompt_template = ''' Text: {documents} \n Question: {question} \n Answer the question based on the text provided and also return the relevant document numbers where you found the answer. If the text doesn't contain the answer, reply that the answer is not available. ''' prompt = prompt_template.format(question = query, documents = context_document) print(prompt) llm_response_result = query_llm_with_prompt(prompt) response = {} response['message'] = query response['text'] = llm_response_result response['documents'] = [{'id': doc.doc_id, 'snippet': doc.doc_text, 'url': doc.url } for doc in similar_docs_subset] return response
The code performs the following tasks:
Let's wrap up this file by adding a main function and asking our question!
if __name__ == '__main__': #Ask your question query = "Who founded Oracle?" print(answer_user_question(org, query)['text'])
If you have followed all the steps correctly, you get a response like the following output:
There you have it. That's how you can talk to your data using Oracle tools. We used a single URL as a data source, but you can extend it to any source containing textual information, such as PDFs, word docs, and company internal wikis.
Hopefully, this tutorial gave you enough information about the new frontier of AI. We can't wait to hear about the amazing solutions you build using Oracle Cloud Infrastructure 's AI tools!
Get started with Oracle Cloud Infrastructure basics