From 734a6300a1cd167965bf449cc7a4db504ab4cd36 Mon Sep 17 00:00:00 2001 From: tristan Date: Tue, 8 Oct 2024 15:37:58 +0000 Subject: [PATCH] Update llama_proxy_man/README.md --- llama_proxy_man/README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/llama_proxy_man/README.md b/llama_proxy_man/README.md index a3dd2f2..344e70b 100644 --- a/llama_proxy_man/README.md +++ b/llama_proxy_man/README.md @@ -3,3 +3,8 @@ - manages multiple llama.cpp instances in the background - keeps track of used & available video & cpu memory - starts/stops llama.cpp instances as needed, to ensure memory limit is never reached + +## Ideas + +- smarter logic to decide what to stop +- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints \ No newline at end of file