34 lines
1.2 KiB
Org Mode
34 lines
1.2 KiB
Org Mode
#+title: Todo
|
|
|
|
|
|
* Name ideas
|
|
- llama herder
|
|
- llama herdsman/women/boy ??
|
|
- llama shepherd ?
|
|
|
|
* MVP
|
|
- [X] fix stopping (doesn't work correctly at all)
|
|
- seems done
|
|
|
|
* Future Features
|
|
- [ ] support for model selection by name on a unified port for /api & /completions
|
|
- [ ] separation of proxy/selection stuff ? config for unmanaged instances for auto model-selection by name
|
|
- [ ] automatic internal port management (search for free ports)
|
|
- [ ] Diagnostic Overview UI/API
|
|
- [ ] Config UI/API ?
|
|
- [ ] better book-keeping abt inflight requests ? (needed ?)
|
|
- [ ] multi node stuff
|
|
- how exactly ?
|
|
- clustering ? (one manager per node ?)
|
|
- ssh support ???
|
|
- [ ] automatic ram usage calc ?
|
|
- [ ] other runners
|
|
- e.g. docker/ run in path etc
|
|
- [ ] other backends ?
|
|
- [ ] more advanced start/stop behavior
|
|
- more config ? e.g. pinning/priorities/prefer-to-kill/start-initially
|
|
- LRU /most used prioritized to keep running
|
|
- speculative relaunch
|
|
- scheduling of how to order in-flight requests + restarts to handle them optimally
|
|
- [ ] advanced high-level foo
|
|
- automatic context-size selection per request/ start with bigger context if current instance has to low context
|