317 lines
8.9 KiB
Org Mode
317 lines
8.9 KiB
Org Mode
#+title: Plan
|
||
|
||
* TODO for 0.0.1-rc1
|
||
- [X] processmanager service
|
||
- [X] spawn task on app startup
|
||
- [X] loop every second
|
||
- [X] start processes
|
||
- [X] query waiting processes from db
|
||
- [X] start them
|
||
- [X] change their status to running
|
||
- [X] stop finished processes in db & remove from RAM registry
|
||
- [X] query status for currently running processes
|
||
- [X] stop those that aren't status=running
|
||
- [X] set their status to finished
|
||
- [ ] must have tweaks
|
||
- pass options to model (ngl , path & model)
|
||
- gpu/nogpu
|
||
- model dropdown (ls *.gguf based)
|
||
- size
|
||
- markdown formatting with markdown-rs + set inner html
|
||
- show small backend starter widget icon /button on chat page
|
||
- test faster refresh
|
||
- chat persistence
|
||
- Config.toml
|
||
- package as appimage
|
||
- add model mode
|
||
- amd/rocm/cuda
|
||
- [ ] ideas to investigate before release
|
||
- stdout inspection
|
||
- visualize setting generation ? [not really useful once settings are per chat?]
|
||
|
||
* TODO next steps after 0.0.1-rc1
|
||
- markdown formatting
|
||
- chat persistence
|
||
- backend logs inspector
|
||
- multiple chats
|
||
- per chat settings/model etc
|
||
- configurable ngl
|
||
- custom backends via pwd, command & args
|
||
- custom backend templates
|
||
- prompt templates
|
||
- sampling settings
|
||
- chat/completion mode?
|
||
- transfer planning into issues
|
||
|
||
* Roadmap
|
||
0.1 model selection from dir, switch models
|
||
- hardcoded ngl
|
||
- llamafile in path or ./llamafile only
|
||
- one chat
|
||
- simple model selection
|
||
- llamafile included templates only
|
||
0.2
|
||
- hardcoded inbuilt chat templates
|
||
- multiple chatrooms
|
||
- persist settings
|
||
- ngl setting
|
||
- persist history
|
||
- summaries
|
||
- extendended backend settings
|
||
- max running? running slots?
|
||
- better model selection
|
||
- extract GGUF metadata
|
||
- model downloader ?
|
||
- huggingface /api/models hardcoded to my account as owner
|
||
- develop some yalu.toml manifest? ?
|
||
- chat templates /completions instead of /chat/completions
|
||
|
||
* Design for 0.1
|
||
- Frontend
|
||
- settings page
|
||
- model dir
|
||
- chat settings drawer
|
||
- model selection (from dir */*.gguf?)
|
||
- chat template (from hardcoded list)
|
||
- start/stop
|
||
- Backend
|
||
- Settings (1)
|
||
- model path
|
||
- Chat (1)
|
||
- Template
|
||
- ModelSettigns
|
||
- model
|
||
- ngl
|
||
- BackendProcess (1)
|
||
- status: started -> running -> finished
|
||
- created from chat & saves its args
|
||
- no update, only create&delete
|
||
- RunnerBackend
|
||
- keep track which processes are running
|
||
- start/stop processes when needed
|
||
|
||
* TODO for 0.1
|
||
- Settings api
|
||
- #[server] fn update_settings
|
||
- model_dir
|
||
- Chat Api
|
||
- #[server] fn update_chat
|
||
- ChatTemplate (llama3, chatml, phi)
|
||
- model path
|
||
- ngl
|
||
- BackendProcess api
|
||
- #[server] fn start_process
|
||
- #[server] fn stop_process
|
||
- #[server] fn restart_process ?
|
||
- BackendRunner worker
|
||
- UI stuff
|
||
- settings page with model_dir
|
||
- drawer on chat
|
||
- settings (model_path & ngl)
|
||
- start/stop
|
||
- Package for private release
|
||
|
||
|
||
* TODO Design for backend runners
|
||
** TODO
|
||
- implement backendconfig CRUD
|
||
- backend tab
|
||
- implement starting of a specified backendconfig
|
||
- "running" tab ?
|
||
- add simple per-start settings
|
||
- context & ngl
|
||
- add model per-start setting
|
||
- needs model settings (ie. download path)
|
||
- probably need global app settings somewhere
|
||
- better message formatting
|
||
- markdown conversion
|
||
** Newest Synthesis
|
||
- 2 Ressources
|
||
- BackendConfig
|
||
- includes state needed to start backend
|
||
- ie. no runtime options like -ctx -m -ngl etc
|
||
- for noparams configs only needed ui is a select dropdown
|
||
- (NO PARAMS !!!!)
|
||
- shipped llamafile
|
||
- llamafile PATH
|
||
- llama.cpp server in PATH ?
|
||
- (not mvp)
|
||
- basic & flexible pwd, cmd, args(prefix)
|
||
- templates for default options (can probably just be in the ui code, auto-filling the form ?)
|
||
- llama.cpp path prebuilt
|
||
- llama.cpp path builder
|
||
- no explicit nix support for now!
|
||
- BackendProcess
|
||
- initialy just start/stop with hardcoded config
|
||
- RunTimeConfig
|
||
- model
|
||
- context etc
|
||
** Open Questions
|
||
- how to model multiple launched instances ?
|
||
- could have different parameters or models loadead
|
||
** Synthesis ?
|
||
- model backend as ressource
|
||
- runner can start stop
|
||
- build interactor pattern services ?
|
||
** (Maybe) better option runner module seperate as a kind of micro subservice
|
||
- only startup fn in main, nothing pub apart from that
|
||
- server api code stays like a mostly simple crud app
|
||
- start background jobs on startub
|
||
- starter/manager
|
||
- reads intended backend state from sqlite
|
||
- has internal state in struct
|
||
- makes internal state agree with db
|
||
- starts backends
|
||
- stops backends
|
||
- etc?
|
||
- frontend just reads and writes db via server fns
|
||
- other background job for having always up-to-date status for backends ?
|
||
- expose status checker via backendapi interface trait
|
||
** (Maybe) stupid option
|
||
- continue current plan, start on demand via server_fn request
|
||
- how to handle only starting a single backend
|
||
- some in process registry needed ?
|
||
|
||
* MVP
|
||
** Backends
|
||
- start on demand
|
||
- simple start/stop
|
||
- as background service
|
||
- simple status via /health
|
||
- Options
|
||
- llamafile
|
||
- in $PATH
|
||
- as executable file next to binary, (enables creating a zip which "just works")
|
||
- llama.cpp
|
||
- via nix via path to llama.cpp directory
|
||
- via path to binary
|
||
- Settings
|
||
- context
|
||
- gpu layers
|
||
- keep model hardcoded for now
|
||
|
||
** Chat Prompt Template
|
||
- simple template defs to get from chat format (with role) to bare text prompt
|
||
- collect some default templates (chatml/llama3)
|
||
- migrate to /completions api
|
||
- apply to specific models ?
|
||
|
||
** Model Selection
|
||
- set folder in general settings
|
||
- read gguf metadata via gguf crate
|
||
- per-model settings (layers? ctx?, vram prediction ?)
|
||
|
||
** Inference settings (in chat as modal or sth like that)
|
||
- set sampler params in chat settings
|
||
** Settings hierarchy ?
|
||
- per_chat>per_model>per_backend>global
|
||
** Setting types ?
|
||
- Model loading
|
||
- context
|
||
- gpu layers
|
||
- Sampling
|
||
- temperature
|
||
- Prompt template
|
||
|
||
|
||
* Settings planning
|
||
** Per Backend
|
||
*** runner config
|
||
- pwd
|
||
- cmd
|
||
- template for args
|
||
- model
|
||
- chat template
|
||
- infer settings ? ( low prio, should switch to other API, that allows settings these at runtime )
|
||
** Per Model
|
||
*** offloading layers ?
|
||
** per chat
|
||
*** inference settings( runtime )
|
||
|
||
* Settings todo
|
||
- start/stop
|
||
- start current backend on demand, just start stop on settings page
|
||
- disable buttons when backend isn ´t running
|
||
- only allow llama-cpp/llamafile launch arguments for now
|
||
|
||
* Next steps (teaser)
|
||
- [x] finish basic chat
|
||
- [x] bigger bubbles (use screen, +flex grow?/maybe even grid?)
|
||
- [x] edit history + system prompt
|
||
- [x] regenerate latest response
|
||
# - save history to db (postponed until multichat)
|
||
- [ ] backend page
|
||
- [ ] infer sampling settings
|
||
- [ ] running settings (gpu layer, context size etc)
|
||
- [ ] model page
|
||
- [ ] set model dir
|
||
- [ ] list by simple filename (& size)
|
||
- [ ] offline metadata (README frontmatter yaml, filename, (gguf crate))
|
||
- [ ] chat settings
|
||
- [ ] none for now, single model & settigns et is selected on respective pages
|
||
* Next steps (private mvp)
|
||
- chatrooms
|
||
- settings/model/etc per chatroom, multiple settingss ets
|
||
|
||
* TODO MVP
|
||
- [ ] add test model downloader to nix devshell
|
||
- [ ] Backend config via TOML
|
||
- just based on llama.cpp /completion for now
|
||
- [ ] Basic chat GUI
|
||
- basic ui with bubbles
|
||
- advanced ui with markdown rendering
|
||
- fix incomplete quotes ?
|
||
- [ ] Prompt template & parameters via TOML
|
||
- [ ] Basic DB stuff
|
||
- single room history
|
||
- prompt templates via DB
|
||
- parameter management via DB (e.g. temperature)
|
||
- [ ] Advanced chat UI
|
||
- Multiple "Rooms"
|
||
- Set prompt & params per room
|
||
- [ ] Basic RAG
|
||
- select vector db
|
||
- qdrant ? chroma ?
|
||
|
||
* TODO Advanced features
|
||
- [ ] Backends
|
||
- Backend Runner
|
||
- llamafile
|
||
- llama.cpp nix (via cmd templates ?)
|
||
- Backend API config?
|
||
- Backend Downloader/Installer
|
||
- [ ] Inference Param Templates
|
||
- [ ] Prompt Templates
|
||
- [ ] model library
|
||
- [ ] model downloader
|
||
- [ ] model selector
|
||
- model data extractionf from gguf
|
||
- [ ] quant selector
|
||
- automatic offloading layer selection based on vram
|
||
- [ ] auto-quantize
|
||
- vocab selection
|
||
- quant checkboxes
|
||
- extract progress ETA
|
||
- imatrix generation
|
||
- dataset downloader ? (or just include a default one?)
|
||
- [ ] Better RAG
|
||
- [ ] add multiple embedding models
|
||
- [ ] add reranking
|
||
- [ ] Generic graph based prompt pre/postprocessing via UI, like ComfyUI
|
||
- [ ] DSL ? Some existing scripting stuff ?
|
||
- [ ] Graph just as visualization, with text-based config
|
||
- [ ] Fancy Graph UI
|
||
|
||
* TODO Polish
|
||
- [ ] Backend Multi-API compat e.g. llama.cpp /completion & /chat/completion
|
||
- has different features (chat/completion has hardcoded prompt template)
|
||
- support only full featured backends for now
|
||
- add chat support here
|
||
|
||
* TODO Go public
|
||
- Rename to YALU ?
|
||
- Polish README.md
|
||
- Clean history
|
||
- Add some more common backends (ollama ?)
|
||
- Sync to github
|
||
- Announce on /locallama
|