2024-09-19 16:49:46 +02:00
|
|
|
# LLama Herder
|
|
|
|
|
|
|
|
- manages multiple llama.cpp instances in the background
|
|
|
|
- keeps track of used & available video & cpu memory
|
|
|
|
- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached
|
2024-10-08 15:37:58 +00:00
|
|
|
|
|
|
|
## Ideas
|
|
|
|
|
|
|
|
- smarter logic to decide what to stop
|
|
|
|
- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints
|