Update llama_proxy_man/README.md
This commit is contained in:
parent
66cf52e2ce
commit
734a6300a1
1 changed files with 5 additions and 0 deletions
|
@ -3,3 +3,8 @@
|
|||
- manages multiple llama.cpp instances in the background
|
||||
- keeps track of used & available video & cpu memory
|
||||
- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached
|
||||
|
||||
## Ideas
|
||||
|
||||
- smarter logic to decide what to stop
|
||||
- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints
|
Loading…
Add table
Reference in a new issue