redvault-ai/llama_proxy_man/README.md

383 B

LLama Herder

  • manages multiple llama.cpp instances in the background
  • keeps track of used & available video & cpu memory
  • starts/stops llama.cpp instances as needed, to ensure memory limit is never reached

Ideas

  • smarter logic to decide what to stop
  • unified api, with proxying by model_name param for stamdartized /v1/chat/completions and /completion like endpoints