redvault-ai/llama_proxy_man/TODO.org

#+title: Todo


* Name ideas
- llama herder
- llama herdsman/women/boy ??
- llama shepherd ?

* MVP
- [X] fix stopping (doesn't work correctly at all)
- seems done

* Future Features
- [ ] support for model selection by name on a unified port for /api & /completions
  - [ ] separation of proxy/selection stuff ? config for unmanaged instances for auto model-selection by name
- [ ] automatic internal port management (search for free ports)
- [ ] Diagnostic Overview UI/API
- [ ] Config UI/API ?
- [ ] better book-keeping abt inflight requests ? (needed ?)
- [ ] multi node stuff
  - how exactly ?
    - clustering ? (one manager per node ?)
    - ssh support ???
- [ ] automatic ram usage calc ?
- [ ] other runners
  - e.g. docker/ run in path etc
- [ ] other backends ?
- [ ] more advanced start/stop behavior
  - more config ? e.g. pinning/priorities/prefer-to-kill/start-initially
  - LRU /most used prioritized to keep running
  - speculative relaunch
  - scheduling of how to order in-flight requests + restarts to handle them optimally
- [ ] advanced high-level foo
  - automatic context-size selection per request/ start with bigger context if current instance has to low context
Add llama_proxy_man pkg 2024-09-19 16:49:46 +02:00			`#+title: Todo`


			`* Name ideas`
			`- llama herder`
			`- llama herdsman/women/boy ??`
			`- llama shepherd ?`

			`* MVP`
			`- [X] fix stopping (doesn't work correctly at all)`
			`- seems done`

			`* Future Features`
			`- [ ] support for model selection by name on a unified port for /api & /completions`
			`- [ ] separation of proxy/selection stuff ? config for unmanaged instances for auto model-selection by name`
			`- [ ] automatic internal port management (search for free ports)`
			`- [ ] Diagnostic Overview UI/API`
			`- [ ] Config UI/API ?`
			`- [ ] better book-keeping abt inflight requests ? (needed ?)`
			`- [ ] multi node stuff`
			`- how exactly ?`
			`- clustering ? (one manager per node ?)`
			`- ssh support ???`
			`- [ ] automatic ram usage calc ?`
			`- [ ] other runners`
			`- e.g. docker/ run in path etc`
			`- [ ] other backends ?`
			`- [ ] more advanced start/stop behavior`
			`- more config ? e.g. pinning/priorities/prefer-to-kill/start-initially`
			`- LRU /most used prioritized to keep running`
			`- speculative relaunch`
			`- scheduling of how to order in-flight requests + restarts to handle them optimally`
			`- [ ] advanced high-level foo`
			`- automatic context-size selection per request/ start with bigger context if current instance has to low context`