- Introduce new modules: `config.rs`, `error.rs`, `inference_process.rs`, `proxy.rs`, `state.rs`, `util.rs` - Update `logging.rs` to include static initialization of logging - Add lib.rs for including in other projects - Refactor `main.rs` to use new modules and improve code structure |
||
---|---|---|
.. | ||
src | ||
Cargo.toml | ||
config.yaml | ||
README.md | ||
TODO.org |
LLama Herder
- manages multiple llama.cpp instances in the background
- keeps track of used & available video & cpu memory
- starts/stops llama.cpp instances as needed, to ensure memory limit is never reached
Ideas
- smarter logic to decide what to stop
- unified api, with proxying by model_name param for stamdartized
/v1/chat/completions
and/completion
like endpoints