vault81/redvault-ai

tristan 734a6300a1 Update llama_proxy_man/README.md

2024-10-08 15:37:58 +00:00

383 B

Raw Blame History

LLama Herder

manages multiple llama.cpp instances in the background
keeps track of used & available video & cpu memory
starts/stops llama.cpp instances as needed, to ensure memory limit is never reached

Ideas

smarter logic to decide what to stop
unified api, with proxying by model_name param for stamdartized /v1/chat/completions and /completion like endpoints