- Add `figment` for config yamls - Small `Makefile.toml` fixes ? (docset seems still broken ??) - Copy `config.yaml` workspace & forge - Embed proxy_man in forge - Remove `backend_process.rs` and `process.rs` - Update `llama_proxy_man/Cargo.toml` and `config.rs` for new dependencies - Format |
||
---|---|---|
.. | ||
src | ||
Cargo.toml | ||
config.yaml | ||
README.md | ||
TODO.org |
LLama Herder
- manages multiple llama.cpp instances in the background
- keeps track of used & available video & cpu memory
- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached
Ideas
- smarter logic to decide what to stop
- unified api, with proxying by model_name param for stamdartized
/v1/chat/completions
and/completion
like endpoints