redvault-ai/llama_proxy_man/README.md

# LLama Herder

- manages multiple llama.cpp instances in the background
- keeps track of used & available video & cpu memory
- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached

## Ideas

- smarter logic to decide what to stop
- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints
Add llama_proxy_man pkg 2024-09-19 16:49:46 +02:00			`# LLama Herder`

			`- manages multiple llama.cpp instances in the background`
			`- keeps track of used & available video & cpu memory`
			`- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached`
Update llama_proxy_man/README.md 2024-10-08 15:37:58 +00:00
			`## Ideas`

			`- smarter logic to decide what to stop`
			- unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints