feat: [SKU modularization] remove sku_config from v1alpha1 and implement skuHandler interface #601

This PR adds the initial draft for the RAGEngine CRD in Kaito. A RAGEngine CRD defines all resources needed to run a RAG on top of a LLM inference service. Upon creating a RAGEngine CR, a new controller will create a deployment which runs a RAG engine instance. The instance provides http endpoints for both `index` and `query` services. The instance can optionally choose a public model embedding service or run a local embedding model with GPU to convert the input index data to vectors. The instance can also connect to a Vector DB instance to persist the vectors db or by default using an in-memory vector DB. The instance uses the `llamaIndex` library to orchestrate the workflow. When RAGEngine instance is up and running, users should send questions to the `query` endpoint of RAG instance instead of the normal `chat` endpoint in the inference service. The RAGEngine is intended to be "standalone". It can use any public inference service or inference services hosted by Kaito workspace. The RAG engine instance is designed to help retrieve prompts from unstructured data (arbitrary index data provided by the users). Retrieving from structured data or search engine is out of the scope for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [SKU modularization] remove sku_config from v1alpha1 and implement skuHandler interface #601

feat: [SKU modularization] remove sku_config from v1alpha1 and implement skuHandler interface #601

Commits on Sep 19, 2024