vault81/redvault-ai

History

Tristan Druyen ad0cd12877 refactor(proxy_man): Org & modularize app - Introduce new modules: `config.rs`, `error.rs`, `inference_process.rs`, `proxy.rs`, `state.rs`, `util.rs` - Update `logging.rs` to include static initialization of logging - Add lib.rs for including in other projects - Refactor `main.rs` to use new modules and improve code structure		2025-02-10 23:40:27 +01:00
..
src	refactor(proxy_man): Org & modularize app	2025-02-10 23:40:27 +01:00
Cargo.toml	Refactor llama_proxy_man	2025-02-10 23:22:31 +01:00
config.yaml	conf: Update model configurations	2025-01-31 13:20:40 +01:00
README.md	Update llama_proxy_man/README.md	2024-10-08 15:37:58 +00:00
TODO.org	Add llama_proxy_man pkg	2024-09-19 17:21:46 +02:00

README.md

LLama Herder

manages multiple llama.cpp instances in the background
keeps track of used & available video & cpu memory
starts/stops llama.cpp instances as needed, to ensure memory limit is never reached

Ideas

smarter logic to decide what to stop
unified api, with proxying by model_name param for stamdartized /v1/chat/completions and /completion like endpoints