Creating the KnowledgeBase
Setting up the Knowledge Base and configuring the vector store.
Knowledge Base
We will use AWS Bedrock Knowledge Bases to provide additional information to the LLM, allowing it to understand the context and deliver relevant insights, thereby enabling more accurate and contextual responses tailored to the specific needs of our application.
Enable Models
To work with the Knowledge Base, it's essential to enable the foundational models. I've already covered this in a previous tutorial, so you can refer to that for guidance.
What are we going to create?
We have a food menu from the restaurant, which we are going to feed into the knowledge base and use to directly interact with the LLM model to retrieve relevant information.
So, we got a sample restaurant menu of what they normally serve to their customers. This information will be fed into the LLM, so next time, we won’t need to check the menu—instead, we can simply ask the AI, “What’s special on the menu today?”
Create a file named food.txt
and store the information below. You can see that there are many varieties, and you can come up with your own innovative ideas.
**Breakfast**
- **Vegetarian Options:**
- Paneer Paratha with Yogurt
- Masala Dosa with Sambar and Coconut Chutney
- Aloo Tikki with Toast and Fresh Juice
- Poha with Pomegranate and Coriander
- Avocado Toast with Eggs
- **Non-Vegetarian Options:**
- Scrambled Eggs with Bacon and Toast
- Chicken Sausage with Eggs and Hash Browns
- Smoked Salmon on Bagel with Cream Cheese
- Omelette (Cheese, Mushroom, and Bell Peppers)
- English Breakfast: Sausages, Bacon, Grilled Tomato, Mushrooms, Beans, Toast
---
**Lunch**
- **Vegetarian Options:**
- Vegetable Biryani with Raita
- Paneer Tikka Masala with Naan or Rice
- Spaghetti Aglio e Olio (Garlic & Olive Oil) with Parmesan
- Grilled Vegetable Wrap with Hummus and Tabbouleh
- Chickpea Salad with Feta, Olives, and Lemon Dressing
- **Non-Vegetarian Options:**
- Chicken Shawarma with Garlic Sauce and Pita Bread
- Grilled Fish with Couscous and Steamed Vegetables
- Lamb Rogan Josh with Rice
- BBQ Chicken Wrap with Coleslaw
- Prawn Alfredo Pasta
---
**Dinner**
- **Vegetarian Options:**
- Vegetable Lasagna
- Tofu Stir-Fry with Bell Peppers and Broccoli
- Dal Tadka with Jeera Rice
- Vegetable Stuffed Bell Peppers with Quinoa
- Spinach and Ricotta Stuffed Ravioli
- **Non-Vegetarian Options:**
- Grilled Chicken Breast with Garlic Mashed Potatoes and Veggies
- Butter Chicken with Naan or Rice
- Seafood Paella with Mussels, Prawns, and Clams
- Grilled Salmon with Asparagus and Lemon Butter Sauce
- Roast Duck with Orange Glaze and Roasted Vegetables
---
**Pastries & Cakes**
- **Pastries:**
- Chocolate Croissant
- Apple Cinnamon Danish
- Almond and Raspberry Danish
- Blueberry Muffins
- Pistachio Eclair
- **Cakes:**
- Classic Vanilla Sponge Cake
- Chocolate Fudge Cake
- Red Velvet Cake with Cream Cheese Frosting
- Lemon Drizzle Cake
- Carrot Cake with Walnuts
- Tiramisu
---
Data Store
Now, we need to upload this data to S3, which will be acting as the Data Source.
Navigate to S3.
Click Create bucket
Provide a unique bucket name.
After the bucket is created, make sure to upload the food.txt
file which we created earlier.
Console Access
Go back to IAM and ensure console access is enabled. Note that you cannot create a Knowledge Base from the root account.
We will enable console access for the user ("bedrock_user") we created at the beginning if you recall.
You can provide your own custom password or autogenerate one.
Knowledge Bases
Now, let's return to the BedRock console.
Click Knowledge Bases.
Click Create.
Now, choose Knowledge Base with vector store.
A vector database is a collection of data stored as mathematical representations. Vector databases make it easier for machine learning models to remember previous inputs, allowing machine learning to be used to power search, recommendations, and text generation use-cases.
Image Source: Pinecone
Provide the following details:
- The name of the Knowledge Base.
- The name of the Service Role.
The service role will attach policies for actions such as listing S3 buckets, invoking the Bedrock model, and accessing OpenSearch.
Next, the data source is going to be Amazon S3, where we have uploaded the food.txt
file.
Select the bucket where you have stored the file.
Next, we need to configure the chunking and parsing strategy. I'll stick with the default, but customization is an option.
Default chunking
: Splits content into chunks of approximately 300 tokens, preserving sentence boundaries.
If you are interested in learning more about chunking and parsing strategies, then definitely check out A Developer’s Guide to Advanced Chunking and Parsing with Amazon Bedrock
Next, we will select the model and vector store as Amazon OpenSearch Serverless. Since I prefer not to manage a vector database, opting for Amazon OpenSearch is the best choice.
Next, I will choose Titan Text Embeddings v2 as our embedding model. There are multiple options available, depending on your choice.
Titan models are highly effective in enhancing productivity and efficiency across a wide range of text-related tasks, including creating copy for blog posts and web pages, categorizing articles, open-ended Q&A, conversational chat, information extraction, and more.
Embedding models are a type of machine learning model designed to represent data (such as text, images, or other forms of information) in a continuous, low-dimensional vector space. These embeddings capture semantic or contextual similarities between pieces of data, enabling machines to perform tasks like comparison, clustering, or classification more effectively.
Source: Couchbase
After selecting the model, be sure to review all the changes and click Create Knowledge Base.
It will take a few minutes to initialize.
Once the Knowledge Base is created, you need to sync the data. Syncing can take anywhere from a few minutes to a couple of hours, depending on the number of documents. Once the sync is complete, you're ready to test the Knowledge Base.
Select the model. I will choose Claude 3 Sonnet.
Feel free to start interacting with the newly created Knowledge Base.
Wow! This is pretty awesome! We’ve received some answers from the LLM. In the next step, we’ll integrate with the Go SDK and interact through backend APIs.