Update llama_proxy_man/README.md

This commit is contained in:
Tristan D. 2024-10-08 15:37:58 +00:00
parent 66cf52e2ce
commit 734a6300a1

View file

@ -3,3 +3,8 @@
- manages multiple llama.cpp instances in the background
- keeps track of used & available video & cpu memory
- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached
## Ideas
- smarter logic to decide what to stop
- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints