redvault-ai/llama_proxy_man/TODO.org

1.2 KiB

Todo

Name ideas

  • llama herder
  • llama herdsman/women/boy ??
  • llama shepherd ?

MVP

  • fix stopping (doesn't work correctly at all)
  • seems done

Future Features

  • support for model selection by name on a unified port for /api & /completions

    • separation of proxy/selection stuff ? config for unmanaged instances for auto model-selection by name
  • automatic internal port management (search for free ports)
  • Diagnostic Overview UI/API
  • Config UI/API ?
  • better book-keeping abt inflight requests ? (needed ?)
  • multi node stuff

    • how exactly ?

      • clustering ? (one manager per node ?)
      • ssh support ???
  • automatic ram usage calc ?
  • other runners

    • e.g. docker/ run in path etc
  • other backends ?
  • more advanced start/stop behavior

    • more config ? e.g. pinning/priorities/prefer-to-kill/start-initially
    • LRU /most used prioritized to keep running
    • speculative relaunch
    • scheduling of how to order in-flight requests + restarts to handle them optimally
  • advanced high-level foo

    • automatic context-size selection per request/ start with bigger context if current instance has to low context