diff --git a/llama_proxy_man/README.md b/llama_proxy_man/README.md index a3dd2f2..344e70b 100644 --- a/llama_proxy_man/README.md +++ b/llama_proxy_man/README.md @@ -3,3 +3,8 @@ - manages multiple llama.cpp instances in the background - keeps track of used & available video & cpu memory - starts/stops llama.cpp instances as needed, to ensure memory limit is never reached + +## Ideas + +- smarter logic to decide what to stop +- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints \ No newline at end of file