9d3b33bf Eaf6 4d06 9dd6 64aa725ab383

What this pattern does:

Serve a large language model (LLM) with GPUs in Google Kubernetes Engine (GKE) mode. Create a GKE Standard cluster that uses multiple L4 GPUs and prepares the GKE infrastructure to serve any of the following models: 1. Falcon 40b. 2. Llama 2 70b

Caveats and Consideration:

Depending on the data format of the model, the number of GPUs varies. In this design, each model uses two L4 GPUs.

Compatibility:

Recent Discussions with "meshery" Tag

Mar 13 | Badge leveling system proposal Ritik Saxena
Mar 11 | [Help Wanted] A list of open DevOps-centric needs on Meshery projects Lee Calcote
May 23 | Meshery Build and Release call Meeting minutes (23rd May 2024) Mohd Uzair
May 22 | Meshery Development Meeting | May 22nd 2024 Yash Sharma
May 20 | New member at Meshery, looking for a MeshMate Emmeline
May 19 | Newcomer looking for guidance Faisal Imtiyaz123
May 17 | Doubt regarding plugins in Meshery UI Utsav Lal
Apr 14 | Unable to deploy meshery to minikube Shahid Ilhan
May 08 | No reachable contexts found in the uploaded kube config Christopher Kalule
May 08 | Meshery Development Meeting | May 8th 2024 Yash Sharma

Serve an LLM with multiple GPUs in GKE

Catalog Details

Pattern Snapshot

Related Patterns

Fault-tolerant batch workloads on GKE

MESHERY4b55

What this pattern does:

Caveats and Consideration:

Compatibility:

Recent Discussions with "meshery" Tag