redvault-ai/llama_proxy_man/TODO.org

#+title: Todo


* Name ideas
- llama herder
- llama herdsman/women/boy ??
- llama shepherd ?

* MVP
- [X] fix stopping (doesn't work correctly at all)
- seems done

* Future Features
- [ ] support for model selection by name on a unified port for /api & /completions
  - [ ] separation of proxy/selection stuff ? config for unmanaged instances for auto model-selection by name
- [ ] automatic internal port management (search for free ports)
- [ ] Diagnostic Overview UI/API
- [ ] Config UI/API ?
- [ ] better book-keeping abt inflight requests ? (needed ?)
- [ ] multi node stuff
  - how exactly ?
    - clustering ? (one manager per node ?)
    - ssh support ???
- [ ] automatic ram usage calc ?
- [ ] other runners
  - e.g. docker/ run in path etc
- [ ] other backends ?
- [ ] more advanced start/stop behavior
  - more config ? e.g. pinning/priorities/prefer-to-kill/start-initially
  - LRU /most used prioritized to keep running
  - speculative relaunch
  - scheduling of how to order in-flight requests + restarts to handle them optimally
- [ ] advanced high-level foo
  - automatic context-size selection per request/ start with bigger context if current instance has to low context