Catalog Details
CATEGORY
deploymentCREATED BY
UPDATED AT
May 28, 2024VERSION
1.0
What this pattern does:
Serve a large language model (LLM) with GPUs in Google Kubernetes Engine (GKE) mode. Create a GKE Standard cluster that uses multiple L4 GPUs and prepares the GKE infrastructure to serve any of the following models: 1. Falcon 40b. 2. Llama 2 70b
Caveats and Consideration:
Depending on the data format of the model, the number of GPUs varies. In this design, each model uses two L4 GPUs.
Compatibility:
Recent Discussions with "meshery" Tag
- Mar 13 | Badge leveling system proposal
- Mar 11 | [Help Wanted] A list of open DevOps-centric needs on Meshery projects
- May 23 | Meshery Build and Release call Meeting minutes (23rd May 2024)
- May 22 | Meshery Development Meeting | May 22nd 2024
- May 20 | New member at Meshery, looking for a MeshMate
- May 19 | Newcomer looking for guidance
- May 17 | Doubt regarding plugins in Meshery UI
- Apr 14 | Unable to deploy meshery to minikube
- May 08 | No reachable contexts found in the uploaded kube config
- May 08 | Meshery Development Meeting | May 8th 2024