# LLama Herder - manages multiple llama.cpp instances in the background - keeps track of used & available video & cpu memory - starts/stops llama.cpp instances as needed, to ensure memory limit is never reached ## Ideas - smarter logic to decide what to stop - unified api, with proxying by model_name param for stamdartized `/v1/chat/completions` and `/completion` like endpoints