318 lines
8.9 KiB
Org Mode
318 lines
8.9 KiB
Org Mode
|
#+title: Plan
|
|||
|
|
|||
|
* TODO for 0.0.1-rc1
|
|||
|
- [X] processmanager service
|
|||
|
- [X] spawn task on app startup
|
|||
|
- [X] loop every second
|
|||
|
- [X] start processes
|
|||
|
- [X] query waiting processes from db
|
|||
|
- [X] start them
|
|||
|
- [X] change their status to running
|
|||
|
- [X] stop finished processes in db & remove from RAM registry
|
|||
|
- [X] query status for currently running processes
|
|||
|
- [X] stop those that aren't status=running
|
|||
|
- [X] set their status to finished
|
|||
|
- [ ] must have tweaks
|
|||
|
- pass options to model (ngl , path & model)
|
|||
|
- gpu/nogpu
|
|||
|
- model dropdown (ls *.gguf based)
|
|||
|
- size
|
|||
|
- markdown formatting with markdown-rs + set inner html
|
|||
|
- show small backend starter widget icon /button on chat page
|
|||
|
- test faster refresh
|
|||
|
- chat persistence
|
|||
|
- Config.toml
|
|||
|
- package as appimage
|
|||
|
- add model mode
|
|||
|
- amd/rocm/cuda
|
|||
|
- [ ] ideas to investigate before release
|
|||
|
- stdout inspection
|
|||
|
- visualize setting generation ? [not really useful once settings are per chat?]
|
|||
|
|
|||
|
* TODO next steps after 0.0.1-rc1
|
|||
|
- markdown formatting
|
|||
|
- chat persistence
|
|||
|
- backend logs inspector
|
|||
|
- multiple chats
|
|||
|
- per chat settings/model etc
|
|||
|
- configurable ngl
|
|||
|
- custom backends via pwd, command & args
|
|||
|
- custom backend templates
|
|||
|
- prompt templates
|
|||
|
- sampling settings
|
|||
|
- chat/completion mode?
|
|||
|
- transfer planning into issues
|
|||
|
|
|||
|
* Roadmap
|
|||
|
0.1 model selection from dir, switch models
|
|||
|
- hardcoded ngl
|
|||
|
- llamafile in path or ./llamafile only
|
|||
|
- one chat
|
|||
|
- simple model selection
|
|||
|
- llamafile included templates only
|
|||
|
0.2
|
|||
|
- hardcoded inbuilt chat templates
|
|||
|
- multiple chatrooms
|
|||
|
- persist settings
|
|||
|
- ngl setting
|
|||
|
- persist history
|
|||
|
- summaries
|
|||
|
- extendended backend settings
|
|||
|
- max running? running slots?
|
|||
|
- better model selection
|
|||
|
- extract GGUF metadata
|
|||
|
- model downloader ?
|
|||
|
- huggingface /api/models hardcoded to my account as owner
|
|||
|
- develop some yalu.toml manifest? ?
|
|||
|
- chat templates /completions instead of /chat/completions
|
|||
|
|
|||
|
* Design for 0.1
|
|||
|
- Frontend
|
|||
|
- settings page
|
|||
|
- model dir
|
|||
|
- chat settings drawer
|
|||
|
- model selection (from dir */*.gguf?)
|
|||
|
- chat template (from hardcoded list)
|
|||
|
- start/stop
|
|||
|
- Backend
|
|||
|
- Settings (1)
|
|||
|
- model path
|
|||
|
- Chat (1)
|
|||
|
- Template
|
|||
|
- ModelSettigns
|
|||
|
- model
|
|||
|
- ngl
|
|||
|
- BackendProcess (1)
|
|||
|
- status: started -> running -> finished
|
|||
|
- created from chat & saves its args
|
|||
|
- no update, only create&delete
|
|||
|
- RunnerBackend
|
|||
|
- keep track which processes are running
|
|||
|
- start/stop processes when needed
|
|||
|
|
|||
|
* TODO for 0.1
|
|||
|
- Settings api
|
|||
|
- #[server] fn update_settings
|
|||
|
- model_dir
|
|||
|
- Chat Api
|
|||
|
- #[server] fn update_chat
|
|||
|
- ChatTemplate (llama3, chatml, phi)
|
|||
|
- model path
|
|||
|
- ngl
|
|||
|
- BackendProcess api
|
|||
|
- #[server] fn start_process
|
|||
|
- #[server] fn stop_process
|
|||
|
- #[server] fn restart_process ?
|
|||
|
- BackendRunner worker
|
|||
|
- UI stuff
|
|||
|
- settings page with model_dir
|
|||
|
- drawer on chat
|
|||
|
- settings (model_path & ngl)
|
|||
|
- start/stop
|
|||
|
- Package for private release
|
|||
|
|
|||
|
|
|||
|
* TODO Design for backend runners
|
|||
|
** TODO
|
|||
|
- implement backendconfig CRUD
|
|||
|
- backend tab
|
|||
|
- implement starting of a specified backendconfig
|
|||
|
- "running" tab ?
|
|||
|
- add simple per-start settings
|
|||
|
- context & ngl
|
|||
|
- add model per-start setting
|
|||
|
- needs model settings (ie. download path)
|
|||
|
- probably need global app settings somewhere
|
|||
|
- better message formatting
|
|||
|
- markdown conversion
|
|||
|
** Newest Synthesis
|
|||
|
- 2 Ressources
|
|||
|
- BackendConfig
|
|||
|
- includes state needed to start backend
|
|||
|
- ie. no runtime options like -ctx -m -ngl etc
|
|||
|
- for noparams configs only needed ui is a select dropdown
|
|||
|
- (NO PARAMS !!!!)
|
|||
|
- shipped llamafile
|
|||
|
- llamafile PATH
|
|||
|
- llama.cpp server in PATH ?
|
|||
|
- (not mvp)
|
|||
|
- basic & flexible pwd, cmd, args(prefix)
|
|||
|
- templates for default options (can probably just be in the ui code, auto-filling the form ?)
|
|||
|
- llama.cpp path prebuilt
|
|||
|
- llama.cpp path builder
|
|||
|
- no explicit nix support for now!
|
|||
|
- BackendProcess
|
|||
|
- initialy just start/stop with hardcoded config
|
|||
|
- RunTimeConfig
|
|||
|
- model
|
|||
|
- context etc
|
|||
|
** Open Questions
|
|||
|
- how to model multiple launched instances ?
|
|||
|
- could have different parameters or models loadead
|
|||
|
** Synthesis ?
|
|||
|
- model backend as ressource
|
|||
|
- runner can start stop
|
|||
|
- build interactor pattern services ?
|
|||
|
** (Maybe) better option runner module seperate as a kind of micro subservice
|
|||
|
- only startup fn in main, nothing pub apart from that
|
|||
|
- server api code stays like a mostly simple crud app
|
|||
|
- start background jobs on startub
|
|||
|
- starter/manager
|
|||
|
- reads intended backend state from sqlite
|
|||
|
- has internal state in struct
|
|||
|
- makes internal state agree with db
|
|||
|
- starts backends
|
|||
|
- stops backends
|
|||
|
- etc?
|
|||
|
- frontend just reads and writes db via server fns
|
|||
|
- other background job for having always up-to-date status for backends ?
|
|||
|
- expose status checker via backendapi interface trait
|
|||
|
** (Maybe) stupid option
|
|||
|
- continue current plan, start on demand via server_fn request
|
|||
|
- how to handle only starting a single backend
|
|||
|
- some in process registry needed ?
|
|||
|
|
|||
|
* MVP
|
|||
|
** Backends
|
|||
|
- start on demand
|
|||
|
- simple start/stop
|
|||
|
- as background service
|
|||
|
- simple status via /health
|
|||
|
- Options
|
|||
|
- llamafile
|
|||
|
- in $PATH
|
|||
|
- as executable file next to binary, (enables creating a zip which "just works")
|
|||
|
- llama.cpp
|
|||
|
- via nix via path to llama.cpp directory
|
|||
|
- via path to binary
|
|||
|
- Settings
|
|||
|
- context
|
|||
|
- gpu layers
|
|||
|
- keep model hardcoded for now
|
|||
|
|
|||
|
** Chat Prompt Template
|
|||
|
- simple template defs to get from chat format (with role) to bare text prompt
|
|||
|
- collect some default templates (chatml/llama3)
|
|||
|
- migrate to /completions api
|
|||
|
- apply to specific models ?
|
|||
|
|
|||
|
** Model Selection
|
|||
|
- set folder in general settings
|
|||
|
- read gguf metadata via gguf crate
|
|||
|
- per-model settings (layers? ctx?, vram prediction ?)
|
|||
|
|
|||
|
** Inference settings (in chat as modal or sth like that)
|
|||
|
- set sampler params in chat settings
|
|||
|
** Settings hierarchy ?
|
|||
|
- per_chat>per_model>per_backend>global
|
|||
|
** Setting types ?
|
|||
|
- Model loading
|
|||
|
- context
|
|||
|
- gpu layers
|
|||
|
- Sampling
|
|||
|
- temperature
|
|||
|
- Prompt template
|
|||
|
|
|||
|
|
|||
|
* Settings planning
|
|||
|
** Per Backend
|
|||
|
*** runner config
|
|||
|
- pwd
|
|||
|
- cmd
|
|||
|
- template for args
|
|||
|
- model
|
|||
|
- chat template
|
|||
|
- infer settings ? ( low prio, should switch to other API, that allows settings these at runtime )
|
|||
|
** Per Model
|
|||
|
*** offloading layers ?
|
|||
|
** per chat
|
|||
|
*** inference settings( runtime )
|
|||
|
|
|||
|
* Settings todo
|
|||
|
- start/stop
|
|||
|
- start current backend on demand, just start stop on settings page
|
|||
|
- disable buttons when backend isn ´t running
|
|||
|
- only allow llama-cpp/llamafile launch arguments for now
|
|||
|
|
|||
|
* Next steps (teaser)
|
|||
|
- [x] finish basic chat
|
|||
|
- [x] bigger bubbles (use screen, +flex grow?/maybe even grid?)
|
|||
|
- [x] edit history + system prompt
|
|||
|
- [x] regenerate latest response
|
|||
|
# - save history to db (postponed until multichat)
|
|||
|
- [ ] backend page
|
|||
|
- [ ] infer sampling settings
|
|||
|
- [ ] running settings (gpu layer, context size etc)
|
|||
|
- [ ] model page
|
|||
|
- [ ] set model dir
|
|||
|
- [ ] list by simple filename (& size)
|
|||
|
- [ ] offline metadata (README frontmatter yaml, filename, (gguf crate))
|
|||
|
- [ ] chat settings
|
|||
|
- [ ] none for now, single model & settigns et is selected on respective pages
|
|||
|
* Next steps (private mvp)
|
|||
|
- chatrooms
|
|||
|
- settings/model/etc per chatroom, multiple settingss ets
|
|||
|
|
|||
|
* TODO MVP
|
|||
|
- [ ] add test model downloader to nix devshell
|
|||
|
- [ ] Backend config via TOML
|
|||
|
- just based on llama.cpp /completion for now
|
|||
|
- [ ] Basic chat GUI
|
|||
|
- basic ui with bubbles
|
|||
|
- advanced ui with markdown rendering
|
|||
|
- fix incomplete quotes ?
|
|||
|
- [ ] Prompt template & parameters via TOML
|
|||
|
- [ ] Basic DB stuff
|
|||
|
- single room history
|
|||
|
- prompt templates via DB
|
|||
|
- parameter management via DB (e.g. temperature)
|
|||
|
- [ ] Advanced chat UI
|
|||
|
- Multiple "Rooms"
|
|||
|
- Set prompt & params per room
|
|||
|
- [ ] Basic RAG
|
|||
|
- select vector db
|
|||
|
- qdrant ? chroma ?
|
|||
|
|
|||
|
* TODO Advanced features
|
|||
|
- [ ] Backends
|
|||
|
- Backend Runner
|
|||
|
- llamafile
|
|||
|
- llama.cpp nix (via cmd templates ?)
|
|||
|
- Backend API config?
|
|||
|
- Backend Downloader/Installer
|
|||
|
- [ ] Inference Param Templates
|
|||
|
- [ ] Prompt Templates
|
|||
|
- [ ] model library
|
|||
|
- [ ] model downloader
|
|||
|
- [ ] model selector
|
|||
|
- model data extractionf from gguf
|
|||
|
- [ ] quant selector
|
|||
|
- automatic offloading layer selection based on vram
|
|||
|
- [ ] auto-quantize
|
|||
|
- vocab selection
|
|||
|
- quant checkboxes
|
|||
|
- extract progress ETA
|
|||
|
- imatrix generation
|
|||
|
- dataset downloader ? (or just include a default one?)
|
|||
|
- [ ] Better RAG
|
|||
|
- [ ] add multiple embedding models
|
|||
|
- [ ] add reranking
|
|||
|
- [ ] Generic graph based prompt pre/postprocessing via UI, like ComfyUI
|
|||
|
- [ ] DSL ? Some existing scripting stuff ?
|
|||
|
- [ ] Graph just as visualization, with text-based config
|
|||
|
- [ ] Fancy Graph UI
|
|||
|
|
|||
|
* TODO Polish
|
|||
|
- [ ] Backend Multi-API compat e.g. llama.cpp /completion & /chat/completion
|
|||
|
- has different features (chat/completion has hardcoded prompt template)
|
|||
|
- support only full featured backends for now
|
|||
|
- add chat support here
|
|||
|
|
|||
|
* TODO Go public
|
|||
|
- Rename to YALU ?
|
|||
|
- Polish README.md
|
|||
|
- Clean history
|
|||
|
- Add some more common backends (ollama ?)
|
|||
|
- Sync to github
|
|||
|
- Announce on /locallama
|