vault81/redvault-ai

Tristan Druyen 8692b5bda4

Add llama_proxy_man pkg

2024-09-19 17:21:46 +02:00

1.2 KiB

Raw Blame History

Todo

Name ideas
MVP
Future Features

Name ideas

llama herder
llama herdsman/women/boy ??
llama shepherd ?

MVP

fix stopping (doesn't work correctly at all)
seems done

Future Features

support for model selection by name on a unified port for /api & /completions
- separation of proxy/selection stuff ? config for unmanaged instances for auto model-selection by name
automatic internal port management (search for free ports)
Diagnostic Overview UI/API
Config UI/API ?
better book-keeping abt inflight requests ? (needed ?)
multi node stuff
- how exactly ?
  - clustering ? (one manager per node ?)
  - ssh support ???
automatic ram usage calc ?
other runners
- e.g. docker/ run in path etc
other backends ?
more advanced start/stop behavior
- more config ? e.g. pinning/priorities/prefer-to-kill/start-initially
- LRU /most used prioritized to keep running
- speculative relaunch
- scheduling of how to order in-flight requests + restarts to handle them optimally
advanced high-level foo
- automatic context-size selection per request/ start with bigger context if current instance has to low context