03/28/2024 | News release | Distributed by Public on 03/28/2024 08:43
Videos are full of valuable information, but tools are often needed to help find it. From educational institutions seeking to analyze lectures and tutorials to businesses aiming to understand customer sentiment in video reviews, transcribing and understanding video content is crucial for informed decision-making and innovation. Recently, advancements in AI/ML technologies have made this task more accessible than ever.
Developing GenAI technologies with Docker opens up endless possibilities for unlocking insights from video content. By leveraging transcription, embeddings, and large language models (LLMs), organizations can gain deeper understanding and make informed decisions using diverse and raw data such as videos.
In this article, we'll dive into a video transcription and chat project that leverages the GenAI Stack, along with seamless integration provided by Docker, to streamline video content processing and understanding.
The application's architecture is designed to facilitate efficient processing and analysis of video content, leveraging cutting-edge AI technologies and containerization for scalability and flexibility. Figure 1 shows an overview of the architecture, which uses Pinecone to store and retrieve the embeddings of video transcriptions.
Figure 1: Schematic diagram outlining a two-component system for processing and interacting with video data.The application's high-level service architecture includes the following:
To get started, complete the following steps:
The application is a chatbot that can answer questions from a video. Additionally, it provides timestamps from the video that can help you find the sources used to answer your question.
The next step is to clone the repository:
git clone https://github.com/dockersamples/docker-genai.git
The project contains the following directories and files:
├── docker-genai/ │ ├── docker-bot/ │ ├── yt-whisper/ │ ├── .env.example │ ├── .gitignore │ ├── LICENSE │ ├── README.md │ └── docker-compose.yaml
In the /docker-genai directory, create a text file called .env, and specify your API keys inside. The following snippet shows the contents of the .env.example file that you can refer to as an example.
#------------------------------------------------------------- # OpenAI #------------------------------------------------------------- OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key #------------------------------------------------------------- # Pinecone #-------------------------------------------------------------- PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key
In a terminal, change directory to your docker-genai directory and run the following command:
docker compose up --build
Next, Docker Compose builds and runs the application based on the services defined in the docker-compose.yaml file. When the application is running, you'll see the logs of two services in the terminal.
In the logs, you'll see the services are exposed on ports 8503 and 8504. The two services are complementary to each other.
The yt-whisper service is running on port 8503. This service feeds the Pinecone database with videos that you want to archive in your knowledge database. The next section explores the yt-whisper service.
The yt-whisper service is a YouTube video processing service that uses the OpenAI Whisper model to generate transcriptions of videos and stores them in a Pinecone database. The following steps outline how to use the service.
Open a browser and access the yt-whisper service at http://localhost:8503. Once the application appears, specify a YouTube video URL in the URL field and select Submit. The example shown in Figure 2 uses a video from David Cardozo.
Figure 2: A web interface showcasing processed video content with a feature to download transcriptions.The yt-whisper service downloads the audio of the video, then uses Whisper to transcribe it into a WebVTT (*.vtt) format (which you can download). Next, it uses the "text-embedding-3-small" model to create embeddings and finally uploads those embeddings into the Pinecone database.
After the video is processed, a video list appears in the web app that informs you which videos have been indexed in Pinecone. It also provides a button to download the transcript.
You can now access the Dockerbot chat service on port 8504 and ask questions about the videos as shown in Figure 3.
Figure 3: Example of a user asking Dockerbot about NVIDIA containers and the application giving a response with links to specific timestamps in the video.In this article, we explored the exciting potential of GenAI technologies combined with Docker for unlocking valuable insights from video content. It shows how the integration of cutting-edge AI models like Whisper, coupled with efficient database solutions like Pinecone, empowers organizations to transform raw video data into actionable knowledge.
Whether you're an experienced developer or just starting to explore the world of AI, the provided resources and code make it simple to embark on your own video-understanding projects.