redvault-ai/llama_proxy_man/README.md

10 lines
No EOL
383 B
Markdown

# LLama Herder
- manages multiple llama.cpp instances in the background
- keeps track of used & available video & cpu memory
- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached
## Ideas
- smarter logic to decide what to stop
- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints