-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [SKU modularization] remove sku_config from v1alpha1 and implement skuHandler interface #601
Closed
Commits on Sep 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1a0269e - Browse repository at this point
Copy the full SHA 1a0269eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 39cd273 - Browse repository at this point
Copy the full SHA 39cd273View commit details -
feat: Add RAGEngine CRD (#597)
This PR adds the initial draft for the RAGEngine CRD in Kaito. A RAGEngine CRD defines all resources needed to run a RAG on top of a LLM inference service. Upon creating a RAGEngine CR, a new controller will create a deployment which runs a RAG engine instance. The instance provides http endpoints for both `index` and `query` services. The instance can optionally choose a public model embedding service or run a local embedding model with GPU to convert the input index data to vectors. The instance can also connect to a Vector DB instance to persist the vectors db or by default using an in-memory vector DB. The instance uses the `llamaIndex` library to orchestrate the workflow. When RAGEngine instance is up and running, users should send questions to the `query` endpoint of RAG instance instead of the normal `chat` endpoint in the inference service. The RAGEngine is intended to be "standalone". It can use any public inference service or inference services hosted by Kaito workspace. The RAG engine instance is designed to help retrieve prompts from unstructured data (arbitrary index data provided by the users). Retrieving from structured data or search engine is out of the scope for now.
Configuration menu - View commit details
-
Copy full SHA for f960215 - Browse repository at this point
Copy the full SHA f960215View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2629234 - Browse repository at this point
Copy the full SHA 2629234View commit details -
feat: Add RAGEngine CRD (#597)
This PR adds the initial draft for the RAGEngine CRD in Kaito. A RAGEngine CRD defines all resources needed to run a RAG on top of a LLM inference service. Upon creating a RAGEngine CR, a new controller will create a deployment which runs a RAG engine instance. The instance provides http endpoints for both `index` and `query` services. The instance can optionally choose a public model embedding service or run a local embedding model with GPU to convert the input index data to vectors. The instance can also connect to a Vector DB instance to persist the vectors db or by default using an in-memory vector DB. The instance uses the `llamaIndex` library to orchestrate the workflow. When RAGEngine instance is up and running, users should send questions to the `query` endpoint of RAG instance instead of the normal `chat` endpoint in the inference service. The RAGEngine is intended to be "standalone". It can use any public inference service or inference services hosted by Kaito workspace. The RAG engine instance is designed to help retrieve prompts from unstructured data (arbitrary index data provided by the users). Retrieving from structured data or search engine is out of the scope for now.
Configuration menu - View commit details
-
Copy full SHA for 4366c36 - Browse repository at this point
Copy the full SHA 4366c36View commit details -
Configuration menu - View commit details
-
Copy full SHA for b75126a - Browse repository at this point
Copy the full SHA b75126aView commit details -
feat: Add RAGEngine CRD (#597)
This PR adds the initial draft for the RAGEngine CRD in Kaito. A RAGEngine CRD defines all resources needed to run a RAG on top of a LLM inference service. Upon creating a RAGEngine CR, a new controller will create a deployment which runs a RAG engine instance. The instance provides http endpoints for both `index` and `query` services. The instance can optionally choose a public model embedding service or run a local embedding model with GPU to convert the input index data to vectors. The instance can also connect to a Vector DB instance to persist the vectors db or by default using an in-memory vector DB. The instance uses the `llamaIndex` library to orchestrate the workflow. When RAGEngine instance is up and running, users should send questions to the `query` endpoint of RAG instance instead of the normal `chat` endpoint in the inference service. The RAGEngine is intended to be "standalone". It can use any public inference service or inference services hosted by Kaito workspace. The RAG engine instance is designed to help retrieve prompts from unstructured data (arbitrary index data provided by the users). Retrieving from structured data or search engine is out of the scope for now.
Configuration menu - View commit details
-
Copy full SHA for b420f09 - Browse repository at this point
Copy the full SHA b420f09View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.