1.2 KiB
1.2 KiB
Todo
Name ideas
- llama herder
- llama herdsman/women/boy ??
- llama shepherd ?
MVP
- fix stopping (doesn't work correctly at all)
- seems done
Future Features
-
support for model selection by name on a unified port for /api & /completions
- separation of proxy/selection stuff ? config for unmanaged instances for auto model-selection by name
- automatic internal port management (search for free ports)
- Diagnostic Overview UI/API
- Config UI/API ?
- better book-keeping abt inflight requests ? (needed ?)
-
multi node stuff
-
how exactly ?
- clustering ? (one manager per node ?)
- ssh support ???
-
- automatic ram usage calc ?
-
other runners
- e.g. docker/ run in path etc
- other backends ?
-
more advanced start/stop behavior
- more config ? e.g. pinning/priorities/prefer-to-kill/start-initially
- LRU /most used prioritized to keep running
- speculative relaunch
- scheduling of how to order in-flight requests + restarts to handle them optimally
-
advanced high-level foo
- automatic context-size selection per request/ start with bigger context if current instance has to low context