Commit graph

  • 5f687d392b
    Nix fixes master Tristan Druyen 2024-05-03 22:03:11 +02:00
  • 5442939fcc
    llama : support small Granite models (#7481) Giuseppe Scrivano 2024-05-28 20:49:49 +02:00
  • 56411a950f
    vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) k.h.lai 2024-05-29 01:25:08 +08:00
  • 2b737caae1
    rpc : resource management rework (#7562) Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • ee3dff6b8e
    Add support for DeepseekV2ForCausalLM (#7519) fairydreaming 2024-05-28 17:07:05 +02:00
  • edc29433fa
    tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00
  • 8b99e2aa66
    llama : handle unknown utf8 bytes (#7588) Georgi Gerganov 2024-05-28 13:55:35 +03:00
  • 271ff3fc44
    github: add refactor to issue template (#7561) Brian 2024-05-28 20:27:27 +10:00
  • e2b065071c
    [SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436) Neo Zhang 2024-05-28 17:53:37 +08:00
  • 0548a4187f
    ggml : generalize GGML_OP_CONCAT (#7563) Georgi Gerganov 2024-05-28 11:04:19 +03:00
  • 9335b969e8
    server: do not remove whitespace at the start of a completion chunk (#7524) mgroeber9110 2024-05-28 06:55:51 +02:00
  • c41767154e
    Markdownish code block fix (#7571) Nathan Epstein 2024-05-28 00:41:14 -04:00
  • 74b239b3d5
    llava : update clip.h (#7580) Ikko Eltociear Ashimine 2024-05-28 11:48:16 +09:00
  • 852aafb163
    update HIP_UMA #7399 (#7414) Djip007 2024-05-28 01:40:47 +02:00
  • 0136966daf
    adding in x64 targets to cmake presets (#7574) kunnis 2024-05-27 18:40:12 -05:00
  • 10b1e45876
    make: add --device-debug to NVCC debug flags (#7542) Johannes Gäßler 2024-05-27 19:34:40 +02:00
  • 197c00681b
    Allow multiple copy function pointers for CUDA graph kernel param updates (#7565) agray3 2024-05-27 18:33:42 +01:00
  • 95f84d5ce8
    Fix q_xxs using mul_mat_q (#7459) AidanBeltonS 2024-05-27 17:34:51 +01:00
  • 5487593bc7
    Add freq factors (#7495) AidanBeltonS 2024-05-27 13:34:09 +01:00
  • 1d8fca72ae
    metal : add GGML_OP_REPEAT kernels (#7557) Georgi Gerganov 2024-05-27 12:10:19 +03:00
  • 62bfef5194
    metal : disable FA kernel for HS=256 (#7556) Georgi Gerganov 2024-05-27 10:38:39 +03:00
  • eaf6e03174
    llama : add comments about experimental flags (#7544) Georgi Gerganov 2024-05-27 09:24:13 +03:00
  • d6ef0e77dd
    github: add self sorted issue ticket forms (#7543) Brian 2024-05-27 10:54:30 +10:00
  • dff451cfa1
    flake.lock: Update (#7540) Georgi Gerganov 2024-05-26 18:54:56 +03:00
  • d298382ad9
    main: replace --no-special with --special (#7534) Brian 2024-05-27 00:10:17 +10:00
  • 32a28217f4
    Fix aya-23 conversion scripts (#7539) Galunid 2024-05-26 16:02:34 +02:00
  • c429b33beb
    llama : add Smaug 70B support (#7402) Bartowski 2024-05-26 08:28:35 -04:00
  • 9146d36fe7
    Readme: add akx/ggify to tools (#1484) Aarni Koskela 2024-05-26 15:09:42 +03:00
  • b9adcbbf92
    SimpleChat Completion Mode flexibility and cleanup, Settings gMe, Optional sliding window (#7480) HanishKVC 2024-05-26 06:26:34 +05:30
  • 9588f196b1
    train : change default FA argument (#7528) Georgi Gerganov 2024-05-25 15:21:30 +03:00
  • 3cbd23ed88
    labeler: added Apple Metal detector (+Kompute) (#7529) Brian 2024-05-25 19:30:42 +10:00
  • 00c6390793
    main : don't print special tokens with --grammar (#6923) Justine Tunney 2024-05-25 05:04:03 -04:00
  • faa0e6979a
    ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433) Masaya, Kato 2024-05-25 17:42:31 +09:00
  • 9791f40258
    android : module (#7502) Elton Kola 2024-05-25 04:11:33 -04:00
  • 902184dd3a
    fix missing slash in fs_get_cache_directory() (#7503) Xuan Son Nguyen 2024-05-25 05:30:59 +02:00
  • 57684331fc
    Make tokenize CLI tool have nicer command line arguments. (#6188) Mikko Juola 2024-05-24 18:14:42 -07:00
  • b83bab15a5
    gguf-py : fix and simplify quantized shape round-trip (#7483) compilade 2024-05-24 21:11:48 -04:00
  • d041d2ceaa
    flake.lock: Update (#7232) Georgi Gerganov 2024-05-24 18:59:06 +03:00
  • 27891f6db0
    docker.yml: disable light-intel and server-intel test (#7515) Brian 2024-05-24 23:47:56 +10:00
  • fbca2f27fc
    Add support for ArcticForCausalLM (#7020) fairydreaming 2024-05-24 14:31:13 +02:00
  • 0df0aa8e43
    add build shared lib in win release package (#7438) Neo Zhang 2024-05-24 10:06:56 +08:00
  • 74f33adf5f
    readme : remove trailing space (#7469) Georgi Gerganov 2024-05-23 17:43:18 +03:00
  • 1debe72737
    ggml : silence UB sanitizer error during iq2_xxs quantization (#0) Georgi Gerganov 2024-05-23 17:17:43 +03:00
  • 007489e895
    Fix phi3 chat template confusion with zephyr (#7449) Tristan Druyen 2024-05-23 16:15:15 +02:00
  • 8b94e799df
    readme : add Bunny in supported models [no ci] (#7469) Raj Hammeer Singh Hada 2024-05-23 18:00:13 +05:30
  • 3015851c5a
    llama : add getters for n_threads/n_threads_batch (#7464) Daniel Bevenius 2024-05-23 14:29:26 +02:00
  • 55ac3b7aea
    ci : use Pythia models instead of OpenLlama (#7470) Georgi Gerganov 2024-05-23 15:28:14 +03:00
  • dacfcebd60
    readme : add GPT-NeoX + Pythia to the list of supported models (#7491) Victor Nogueira 2024-05-23 15:12:43 +03:00
  • 9b82476ee9
    Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461) fairydreaming 2024-05-23 11:49:53 +02:00
  • a61a94e543
    llama : rename n_ctx -> cache.size, less confusing (#0) Georgi Gerganov 2024-05-23 12:38:18 +03:00
  • 152da28ae5
    labeler.yml: add embedding label detector [no ci] (#7482) Brian 2024-05-23 17:40:43 +10:00
  • d48c88cbd5
    ggml : remove ggml_flash_attn and ggml_flash_ff (#7463) Georgi Gerganov 2024-05-23 10:00:44 +03:00
  • e84b71c2c6
    ggml : drop support for QK_K=64 (#7473) Georgi Gerganov 2024-05-23 10:00:21 +03:00
  • 1b1e27cb49
    Update vulkan rope implementation to support frequency factors (#7475) 0cc4m 2024-05-23 08:59:59 +02:00
  • fbf777d2b9
    main : minor (#7462) Georgi Gerganov 2024-05-23 09:43:24 +03:00
  • cd93a28cb1
    CUDA: fix FA out-of-bounds reads (#7479) Johannes Gäßler 2024-05-23 00:31:20 +02:00
  • 1e374365d1
    SimpleChat: a simple and dumb web front end for testing /chat/completions and /completions end points and try chat (#7350) HanishKVC 2024-05-22 23:23:21 +05:30
  • 197ff91462
    build : remove zig (#7471) Georgi Gerganov 2024-05-22 20:05:38 +03:00
  • 6ff13987ad
    common : normalize naming style (#7462) Georgi Gerganov 2024-05-22 20:04:20 +03:00
  • 38c03478a3
    CUDA: fix FA out-of-bounds writes (#7465) Johannes Gäßler 2024-05-22 17:58:25 +02:00
  • b18532a4ef
    phi3 : duplicate rope factors in each layer (#7447) slaren 2024-05-22 16:10:46 +02:00
  • fcda1128bc
    vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) k.h.lai 2024-05-22 20:53:21 +08:00
  • 03d8900ebe
    llama : add missing model type names (#7445) Justine Tunney 2024-05-22 07:08:18 -04:00
  • 9b3d833189
    cuda : fix compile warning (#7454) Georgi Gerganov 2024-05-22 12:36:37 +03:00
  • 95fb0aefab
    CUDA: remove incorrect precision check (#7454) Johannes Gäßler 2024-05-22 10:24:29 +02:00
  • 3e5faa8503
    cuda : fix rope + add tests (#7452) Georgi Gerganov 2024-05-22 11:01:35 +03:00
  • 201cc11afa
    llama : add phi3 128K model support (#7225) liuwei-git 2024-05-22 04:28:32 +08:00
  • 6369bf0433
    metal : handle F16 inf values, fix FA partial offload (#7434) Georgi Gerganov 2024-05-21 23:03:42 +03:00
  • e402de364b
    grammars: fix resampling logic regression (#7424) Olivier Chafik 2024-05-21 20:40:00 +01:00
  • fcf6538ba6
    CUDA: fix unused warning in mmq.cu (#7442) Johannes Gäßler 2024-05-21 19:27:12 +02:00
  • c3f8d58356
    tests : test-tokenizer-0.sh print more info (#7402) Georgi Gerganov 2024-05-21 19:53:48 +03:00
  • 11474e756d
    examples: cache hf model when --model not provided (#7353) Amir 2024-05-21 17:13:12 +03:00
  • d8ee902227
    CUDA: deduplicate mmq code (#7397) Johannes Gäßler 2024-05-21 16:02:12 +02:00
  • d7e852c1bc
    Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425) jaime-m-p 2024-05-21 14:39:48 +02:00
  • 917dc8cfa6
    Tokenizer SPM fixes for phi-3 and llama-spm (#7375) jaime-m-p 2024-05-20 20:15:57 +02:00
  • fabf30b4c4
    llama : remove Persimmon (#7408) Georgi Gerganov 2024-05-20 19:35:28 +03:00
  • 20385cebcc
    perplexity: update README FP16 results [no ci] (#7413) Johannes Gäßler 2024-05-20 18:15:38 +02:00
  • db10f01310
    rpc : track allocated buffers (#7411) Radoslav Gerganov 2024-05-20 16:36:55 +03:00
  • 3bc10cb485
    server : fix temperature + disable some tests (#7409) Georgi Gerganov 2024-05-20 15:10:03 +03:00
  • 6bf9b66fa3
    [SYCL] Update SYCL upscale operation (#7321) AidanBeltonS 2024-05-20 12:08:23 +01:00
  • 26cd4237bc
    Update README.md (#7410) Bingan 2024-05-20 17:55:34 +08:00
  • 213e90ed73
    ggml-opencl, llama: using reserve() if count already known (#7272) Herman Semenov 2024-05-20 07:33:21 +00:00
  • 65c58207ec
    ggml : add loongarch lsx and lasx support (#6454) junchao-loongson 2024-05-20 15:19:21 +08:00
  • 1cc0155d04
    server : tuning tests (#7388) Georgi Gerganov 2024-05-20 10:16:41 +03:00
  • e932094d58
    server : return error on too large embedding input (#7389) Georgi Gerganov 2024-05-20 08:56:05 +03:00
  • 2789baf480
    tests : fix --keep_split -> --keep-split (#7374) Georgi Gerganov 2024-05-20 08:55:09 +03:00
  • 33c8d50acc
    Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) Srihari-mcw 2024-05-19 19:18:39 -07:00
  • d359f30921
    llama : remove MPI backend (#7395) slaren 2024-05-20 01:17:03 +02:00
  • 1ea2a0036e
    quantize : fix --keep-split check (#7374) Fred Douglas 2024-05-19 11:37:04 -05:00
  • f030ec1f7a
    Vulkan Embedding Fix (#7360) 0cc4m 2024-05-19 17:19:53 +02:00
  • e4e6f67be6
    ggml : fix another case of quants nans (#7387) slaren 2024-05-19 17:08:46 +02:00
  • 5ca49cbecd
    ggml: implement quantized KV cache for FA (#7372) Johannes Gäßler 2024-05-19 16:46:13 +02:00
  • 1b01f06db0
    server: add test for token probs (#7347) Johannes Gäßler 2024-05-19 16:26:02 +02:00
  • 41858392e1
    server: fix seed being reported back (#7382) Johannes Gäßler 2024-05-19 16:06:33 +02:00
  • 6aade19ee7
    Add StableLM2 pre-tokenizer (#7349) Anas Ahouzi 2024-05-19 14:46:46 +02:00
  • ab33f7a338
    cuda : clear error after buffer allocation failure (#7376) slaren 2024-05-19 14:19:37 +02:00
  • e23b974f4c
    labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363) Brian 2024-05-19 20:51:03 +10:00
  • 854d365aba
    cmake : update android comments (#7341) Georgi Gerganov 2024-05-19 11:01:01 +03:00
  • f5bf761747
    Capture CUDA logging output (#7298) fraxy-v 2024-05-19 01:44:42 +03:00
  • 059031b8c4
    ci : re-enable sanitizer runs (#7358) Georgi Gerganov 2024-05-18 18:55:54 +03:00