-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] Clean duplicated vector utils #5715
base: main
Are you sure you want to change the base?
Commits on Jan 11, 2024
-
[Inference] First PR for rebuild colossal-infer (hpcaitech#5143)
* add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 4cf4682 - Browse repository at this point
Copy the full SHA 4cf4682View commit details -
[Inference] Add readme (roadmap) and fulfill request handler (hpcaite…
…ch#5147) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 56e75ee - Browse repository at this point
Copy the full SHA 56e75eeView commit details -
[Inference/NFC] Clean outdated inference tests and deprecated kernels (…
…hpcaitech#5159) * [inference/nfc] remove outdated inference tests * remove outdated kernel tests * remove deprecated triton kernels * remove imports from deprecated kernels
Configuration menu - View commit details
-
Copy full SHA for 2bb9224 - Browse repository at this point
Copy the full SHA 2bb9224View commit details -
[Inference]Add BatchInferState, Sequence and InferConfig (hpcaitech#5149
Configuration menu - View commit details
-
Copy full SHA for fab9b93 - Browse repository at this point
Copy the full SHA fab9b93View commit details -
[Inference] Add CacheBlock and KV-Cache Manager (hpcaitech#5156)
* [Inference] Add KVCache Manager * function refactored * add test for KVCache Manager * add attr beam width * Revise alloc func in CacheManager * Fix docs and pytests * add tp slicing for head number * optimize shapes of tensors used as physical cache * Apply using InferenceConfig on KVCacheManager * rm duplicate config file * Optimize cache allocation: use contiguous cache * Fix config in pytest (and config)
Configuration menu - View commit details
-
Copy full SHA for 3de2e62 - Browse repository at this point
Copy the full SHA 3de2e62View commit details -
[Inference]Update inference config and fix test (hpcaitech#5178)
* unify the config setting * fix test * fix import * fix test * fix * fix * add logger * revise log info --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 93aeacc - Browse repository at this point
Copy the full SHA 93aeaccView commit details -
[Inference] Add the logic of the inference engine (hpcaitech#5173)
* add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt
Configuration menu - View commit details
-
Copy full SHA for 8daee26 - Browse repository at this point
Copy the full SHA 8daee26View commit details -
[Inference] add logit processor and request handler (hpcaitech#5166)
* add logit processor and request handler * add * add * add * fix * add search tokens and update func * finish request handler * add running list test * fix test * fix some bug * add * add * fix bugs * fix some bugs * fix bug * fix * fix * add copy fun * del useless attn * fix request status --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 0e61646 - Browse repository at this point
Copy the full SHA 0e61646View commit details -
Configuration menu - View commit details
-
Copy full SHA for 86853a3 - Browse repository at this point
Copy the full SHA 86853a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 62fd08e - Browse repository at this point
Copy the full SHA 62fd08eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6296858 - Browse repository at this point
Copy the full SHA 6296858View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9489dc6 - Browse repository at this point
Copy the full SHA 9489dc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4df8876 - Browse repository at this point
Copy the full SHA 4df8876View commit details -
[kernel] Add triton kernel for context attention (FAv2) without paddi…
…ng (hpcaitech#5192) * add context attn unpadded triton kernel * test compatibility * kv cache copy (testing) * fix k/v cache copy * fix kv cache copy and test * fix boundary of block ptrs * add support for GQA/MQA and testing * fix import statement --------- Co-authored-by: Round Heng <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 07b5283 - Browse repository at this point
Copy the full SHA 07b5283View commit details -
Configuration menu - View commit details
-
Copy full SHA for 02c1bf8 - Browse repository at this point
Copy the full SHA 02c1bf8View commit details -
Configuration menu - View commit details
-
Copy full SHA for bbfebfb - Browse repository at this point
Copy the full SHA bbfebfbView commit details -
Configuration menu - View commit details
-
Copy full SHA for b2eb9cd - Browse repository at this point
Copy the full SHA b2eb9cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3ad1f3b - Browse repository at this point
Copy the full SHA 3ad1f3bView commit details -
[Inference] Pytorch Attention func, pad&nopad input support (hpcaitec…
…h#5219) * add attn * add attention test * fix attn forward * fix decoding
Configuration menu - View commit details
-
Copy full SHA for bfd9b1b - Browse repository at this point
Copy the full SHA bfd9b1bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 47e53ea - Browse repository at this point
Copy the full SHA 47e53eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for fa4fbdb - Browse repository at this point
Copy the full SHA fa4fbdbView commit details -
[Hotfix] Fix accuracy and align attention method api with Triton kern…
…el (hpcaitech#5229) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs
Configuration menu - View commit details
-
Copy full SHA for e545a87 - Browse repository at this point
Copy the full SHA e545a87View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2a73e82 - Browse repository at this point
Copy the full SHA 2a73e82View commit details -
Configuration menu - View commit details
-
Copy full SHA for fab294c - Browse repository at this point
Copy the full SHA fab294cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 10e3c9f - Browse repository at this point
Copy the full SHA 10e3c9fView commit details -
Configuration menu - View commit details
-
Copy full SHA for d40eb26 - Browse repository at this point
Copy the full SHA d40eb26View commit details -
[Inference] Kernel: no pad rotary embedding (hpcaitech#5252)
* fix bugs * comment * use more accurate atol * fix
Configuration menu - View commit details
-
Copy full SHA for fded91d - Browse repository at this point
Copy the full SHA fded91dView commit details -
[kernel] Add flash decoding triton kernel for blocked kv cache (hpcai…
…tech#5249) * add flash decoding unpad triton kernel * rename flash decoding kernel * add kernel testing (draft) * revise pytest * support kv group (GQA) * (trivial) fix api and pytest * (trivial) func renaming * (trivial) func/file renaming * refactor pytest for attention * (trivial) format and consistent vars of context/decode attn * (trivial) remove test redundancy
Configuration menu - View commit details
-
Copy full SHA for 1513f20 - Browse repository at this point
Copy the full SHA 1513f20View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1ded7e8 - Browse repository at this point
Copy the full SHA 1ded7e8View commit details
Commits on Jan 15, 2024
-
[kernel] Add KV cache copy kernel during decoding (hpcaitech#5261)
* add kv copy triton kernel during decoding stage * add pytest and fix kernel * fix test utilities * revise kernel config * add benchmark for kvcache copy
Configuration menu - View commit details
-
Copy full SHA for fa85e02 - Browse repository at this point
Copy the full SHA fa85e02View commit details -
Configuration menu - View commit details
-
Copy full SHA for c597678 - Browse repository at this point
Copy the full SHA c597678View commit details -
[Inference] Fix request handler and add recycle logic (hpcaitech#5260)
* fix request handler * fix comment
Configuration menu - View commit details
-
Copy full SHA for d8db500 - Browse repository at this point
Copy the full SHA d8db500View commit details
Commits on Jan 16, 2024
-
[kernel] Revise KVCache copy triton kernel API (hpcaitech#5273)
* [kernel/fix] revise kvcache copy kernel api * fix benchmark
Configuration menu - View commit details
-
Copy full SHA for 0f2b46a - Browse repository at this point
Copy the full SHA 0f2b46aView commit details
Commits on Jan 17, 2024
-
[Inference]Adapted to the triton attn kernels (hpcaitech#5264)
* adapted to the triton attn kernels * fix pad input * adapted to copy_kv_to_blocked_cache * fix ci test * update kv memcpy * remove print
Configuration menu - View commit details
-
Copy full SHA for 86b63f7 - Browse repository at this point
Copy the full SHA 86b63f7View commit details
Commits on Jan 18, 2024
-
[kernel] Add RMSLayerNorm triton kernel (hpcaitech#5262)
* add layerrmsnorm triton kernel * add layerrmsnorm kernel * modify the atol and rtol in test file * Remove the logics of mean computations, and update the name of ther kernel functions and files * add benchmark of rms norm
Configuration menu - View commit details
-
Copy full SHA for 5ae9099 - Browse repository at this point
Copy the full SHA 5ae9099View commit details -
[Hotfix] Fix bugs in testing continuous batching (hpcaitech#5270)
* fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func
Configuration menu - View commit details
-
Copy full SHA for 9e2342b - Browse repository at this point
Copy the full SHA 9e2342bView commit details
Commits on Jan 19, 2024
-
[kernel/fix] Performance Optimization for Decoding Kernel and Benchma…
…rking (hpcaitech#5274) * prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions
Configuration menu - View commit details
-
Copy full SHA for 6e487e7 - Browse repository at this point
Copy the full SHA 6e487e7View commit details
Commits on Jan 22, 2024
-
[inference] Adapted to Rotary Embedding and RMS Norm (hpcaitech#5283)
* adapted to rotary_embedding * adapted to nopad rms norm * fix bugs in benchmark * fix flash_decoding.py
Configuration menu - View commit details
-
Copy full SHA for bfff925 - Browse repository at this point
Copy the full SHA bfff925View commit details -
Configuration menu - View commit details
-
Copy full SHA for cea9c86 - Browse repository at this point
Copy the full SHA cea9c86View commit details -
Merge pull request hpcaitech#5297 from yuehuayingxueluo/fix_rotary_em…
…bedding [Inference/fix]Add utils.py for Rotary Embedding
Configuration menu - View commit details
-
Copy full SHA for b785319 - Browse repository at this point
Copy the full SHA b785319View commit details
Commits on Jan 23, 2024
-
[Inference] Benchmarking rotary embedding and add a fetch function (h…
…pcaitech#5277) * fix bugs and add a cos/sin cache fetch func * add docstring * fix bug * fix
Configuration menu - View commit details
-
Copy full SHA for 8e606ec - Browse repository at this point
Copy the full SHA 8e606ecView commit details -
[Kernel/Fix] Revise flash attention triton kernel API and add benchma…
…rk (hpcaitech#5301) * fix decoding kernel pytest * revise and add triton context attn benchmark
Configuration menu - View commit details
-
Copy full SHA for 3da9993 - Browse repository at this point
Copy the full SHA 3da9993View commit details
Commits on Jan 24, 2024
-
[Inference]Add fused rotary kernel and get cos cache kernel (hpcaitec…
…h#5302) * add fused rotary and get cos cache func * staged * fix bugs * fix bugs
Configuration menu - View commit details
-
Copy full SHA for c647e00 - Browse repository at this point
Copy the full SHA c647e00View commit details
Commits on Jan 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for af8359c - Browse repository at this point
Copy the full SHA af8359cView commit details
Commits on Jan 26, 2024
-
[inference]Optimize the usage of the mid tensors space in flash attn (h…
…pcaitech#5304) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py
Configuration menu - View commit details
-
Copy full SHA for 4f28cb4 - Browse repository at this point
Copy the full SHA 4f28cb4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ddd8b3 - Browse repository at this point
Copy the full SHA 7ddd8b3View commit details
Commits on Jan 29, 2024
-
[Inference] Update rms norm kernel, benchmark with vLLM (hpcaitech#5315)
* add * xi * del * del * fix
Configuration menu - View commit details
-
Copy full SHA for 1f8a75d - Browse repository at this point
Copy the full SHA 1f8a75dView commit details -
[DOC] Update inference readme (hpcaitech#5280)
* add readme * add readme * 1 * update engine * finish readme * add readme
Configuration menu - View commit details
-
Copy full SHA for c7c104c - Browse repository at this point
Copy the full SHA c7c104cView commit details
Commits on Jan 30, 2024
-
[Inference]Add Nopadding Llama Modeling (hpcaitech#5327)
* add nopadding llama modeling * add nopadding_llama.py * rm unused codes * fix bugs in test_xine_copy.py * fix code style
Configuration menu - View commit details
-
Copy full SHA for e8f0642 - Browse repository at this point
Copy the full SHA e8f0642View commit details -
[Infer] Optimize Blocked KVCache And Kernels Using It (hpcaitech#5325)
* revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval
Configuration menu - View commit details
-
Copy full SHA for 5f98a9d - Browse repository at this point
Copy the full SHA 5f98a9dView commit details
Commits on Jan 31, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c565519 - Browse repository at this point
Copy the full SHA c565519View commit details -
Merge pull request hpcaitech#5339 from FrankLeeeee/sync/merge-main
Sync/merge main
Configuration menu - View commit details
-
Copy full SHA for 1336838 - Browse repository at this point
Copy the full SHA 1336838View commit details -
[Inference] Kernel Fusion, fused copy kv cache into rotary embedding (h…
…pcaitech#5336) * revise rotary embedding * remove useless print * adapt
Configuration menu - View commit details
-
Copy full SHA for df0aa49 - Browse repository at this point
Copy the full SHA df0aa49View commit details
Commits on Feb 1, 2024
-
[inference] simplified config verification (hpcaitech#5346)
* [inference] simplified config verification * polish * polish
Configuration menu - View commit details
-
Copy full SHA for f8e456d - Browse repository at this point
Copy the full SHA f8e456dView commit details -
[Inference]Repalce Attention layer and MLP layer by shardformer to op…
…timize the weight transpose operation,add fused_qkv and fused linear_add (hpcaitech#5340) * add fused qkv * replace attn and mlp by shardformer * fix bugs in mlp * add docstrings * fix test_inference_engine.py * add optimize unbind * add fused_addmm * rm squeeze(1) * refactor codes * fix ci bugs * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention * Removed the dependency on LlamaFlashAttention2 * rollback test_inference_engine.py
Configuration menu - View commit details
-
Copy full SHA for 249644c - Browse repository at this point
Copy the full SHA 249644cView commit details
Commits on Feb 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for db1a763 - Browse repository at this point
Copy the full SHA db1a763View commit details -
Configuration menu - View commit details
-
Copy full SHA for e76acbb - Browse repository at this point
Copy the full SHA e76acbbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 027aa10 - Browse repository at this point
Copy the full SHA 027aa10View commit details -
[Inference/opt]Optimize the mid tensor of RMS Norm (hpcaitech#5350)
* opt rms_norm * fix bugs in rms_layernorm
Configuration menu - View commit details
-
Copy full SHA for 21ad4a2 - Browse repository at this point
Copy the full SHA 21ad4a2View commit details -
[Inference]Optimize generation process of inference engine (hpcaitech…
…#5356) * opt inference engine * fix run_benchmark.sh * fix generate in engine.py * rollback tesh_inference_engine.py
Configuration menu - View commit details
-
Copy full SHA for 631862f - Browse repository at this point
Copy the full SHA 631862fView commit details
Commits on Feb 6, 2024
-
[Fix/Infer] Remove unused deps and revise requirements (hpcaitech#5341)
* remove flash-attn dep * rm padding llama * revise infer requirements * move requirements out of module
Configuration menu - View commit details
-
Copy full SHA for 1dedb57 - Browse repository at this point
Copy the full SHA 1dedb57View commit details -
[Inference]Fused the gate and up proj in mlp,and optimized the autogr…
…ad process. (hpcaitech#5365) * fused the gate and up proj in mlp * fix code styles * opt auto_grad * rollback test_inference_engine.py * modifications based on the review feedback. * fix bugs in flash attn * Change reshape to view * fix test_rmsnorm_triton.py
Configuration menu - View commit details
-
Copy full SHA for 35382a7 - Browse repository at this point
Copy the full SHA 35382a7View commit details
Commits on Feb 7, 2024
-
[Inference] Adapt to Fused rotary (hpcaitech#5348)
* revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix
Configuration menu - View commit details
-
Copy full SHA for 9f4ab2e - Browse repository at this point
Copy the full SHA 9f4ab2eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8106ede - Browse repository at this point
Copy the full SHA 8106edeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 58740b5 - Browse repository at this point
Copy the full SHA 58740b5View commit details -
[Inference/opt] Fused KVCahce Memcopy (hpcaitech#5374)
* fused kv memcopy * add TODO in test_kvcache_copy.py
Configuration menu - View commit details
-
Copy full SHA for 6fb4bcb - Browse repository at this point
Copy the full SHA 6fb4bcbView commit details -
[Inference] User Experience: update the logic of default tokenizer an…
…d generation config. (hpcaitech#5337) * add * fix * fix * pause * fix * fix pytest * align * fix * license * fix * fix * fix readme * fix some bugs * remove tokenizer config
Configuration menu - View commit details
-
Copy full SHA for 1f8c7e7 - Browse repository at this point
Copy the full SHA 1f8c7e7View commit details
Commits on Feb 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9afa520 - Browse repository at this point
Copy the full SHA 9afa520View commit details -
[Inference]Support vllm testing in benchmark scripts (hpcaitech#5379)
* add vllm benchmark scripts * fix code style * update run_benchmark.sh * fix code style
Configuration menu - View commit details
-
Copy full SHA for 8c69deb - Browse repository at this point
Copy the full SHA 8c69debView commit details
Commits on Feb 19, 2024
-
[Inference] Optimize and Refactor Inference Batching/Scheduling (hpca…
…itech#5367) * add kvcache manager funcs for batching * add batch bucket for batching * revise RunningList struct in handler * add kvcache/batch funcs for compatibility * use new batching methods * fix indexing bugs * revise abort logic * use cpu seq lengths/block tables * rm unused attr in Sequence * fix type conversion/default arg * add and revise pytests * revise pytests, rm unused tests * rm unused statements * fix pop finished indexing issue * fix: use index in batch when retrieving inputs/update seqs * use dict instead of odict in batch struct * arg type hinting * fix make compress * refine comments * fix: pop_n_seqs to pop the first n seqs * add check in request handler * remove redundant conversion * fix test for request handler * fix pop method in batch bucket * fix prefill adding
Configuration menu - View commit details
-
Copy full SHA for b21aac5 - Browse repository at this point
Copy the full SHA b21aac5View commit details
Commits on Feb 21, 2024
-
[Inference]Fused kv copy into rotary calculation (hpcaitech#5383)
* revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix * fused kv copy * fused copy * colossalai/kernel/triton/no_pad_rotary_embedding.py * del padding llama * del
Configuration menu - View commit details
-
Copy full SHA for 7301038 - Browse repository at this point
Copy the full SHA 7301038View commit details -
Optimized the execution interval time between cuda kernels caused by …
…view and memcopy (hpcaitech#5390) * opt_view_and_memcopy * fix bugs in ci * fix ci bugs * update benchmark scripts * fix ci bugs
Configuration menu - View commit details
-
Copy full SHA for 2a718c8 - Browse repository at this point
Copy the full SHA 2a718c8View commit details
Commits on Feb 23, 2024
-
[Fix/Inference] Fix format of input prompts and input model in infere…
…nce engine (hpcaitech#5395) * Fix bugs in inference_engine * fix bugs in engine.py * rm CUDA_VISIBLE_DEVICES * add request_ids in generate * fix bug in engine.py * add logger.debug for BatchBucket
Configuration menu - View commit details
-
Copy full SHA for bc1da87 - Browse repository at this point
Copy the full SHA bc1da87View commit details
Commits on Feb 26, 2024
-
[Infer/Fix] Fix Dependency in test - RMSNorm kernel (hpcaitech#5399)
fix dependency in pytest
Configuration menu - View commit details
-
Copy full SHA for 1906118 - Browse repository at this point
Copy the full SHA 1906118View commit details
Commits on Feb 28, 2024
-
[Inference]Add CUDA KVCache Kernel (hpcaitech#5406)
* add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review
Configuration menu - View commit details
-
Copy full SHA for 600881a - Browse repository at this point
Copy the full SHA 600881aView commit details -
[Inference]Move benchmark-related code to the example directory. (hpc…
…aitech#5408) * move benchmark-related code to the example directory. * fix bugs in test_fused_rotary_embedding.py
Configuration menu - View commit details
-
Copy full SHA for 0aa27f1 - Browse repository at this point
Copy the full SHA 0aa27f1View commit details
Commits on Mar 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0310b76 - Browse repository at this point
Copy the full SHA 0310b76View commit details -
Configuration menu - View commit details
-
Copy full SHA for 593a72e - Browse repository at this point
Copy the full SHA 593a72eView commit details
Commits on Mar 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 95c2149 - Browse repository at this point
Copy the full SHA 95c2149View commit details
Commits on Mar 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for cefaeb5 - Browse repository at this point
Copy the full SHA cefaeb5View commit details -
Merge pull request hpcaitech#5433 from Courtesy-Xs/add_silu_and_mul
【Inference】Add silu_and_mul for infer
Configuration menu - View commit details
-
Copy full SHA for 2b28b54 - Browse repository at this point
Copy the full SHA 2b28b54View commit details -
Configuration menu - View commit details
-
Copy full SHA for a46598a - Browse repository at this point
Copy the full SHA a46598aView commit details -
Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech…
…/ColossalAI into add_gpu_launch_config
Configuration menu - View commit details
-
Copy full SHA for 01d289d - Browse repository at this point
Copy the full SHA 01d289dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5eb5ff1 - Browse repository at this point
Copy the full SHA 5eb5ff1View commit details -
Configuration menu - View commit details
-
Copy full SHA for f7aecc0 - Browse repository at this point
Copy the full SHA f7aecc0View commit details
Commits on Mar 11, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b2c0d9f - Browse repository at this point
Copy the full SHA b2c0d9fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9dec66f - Browse repository at this point
Copy the full SHA 9dec66fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 633e95b - Browse repository at this point
Copy the full SHA 633e95bView commit details -
Merge pull request hpcaitech#5435 from Courtesy-Xs/add_gpu_launch_config
Add query and other components
Configuration menu - View commit details
-
Copy full SHA for 21e1e36 - Browse repository at this point
Copy the full SHA 21e1e36View commit details -
Configuration menu - View commit details
-
Copy full SHA for 095c070 - Browse repository at this point
Copy the full SHA 095c070View commit details
Commits on Mar 12, 2024
-
Merge pull request hpcaitech#5445 from Courtesy-Xs/refactor_infer_com…
…pilation Refactor colossal-infer code arch
Configuration menu - View commit details
-
Copy full SHA for 368a2aa - Browse repository at this point
Copy the full SHA 368a2aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for b699f54 - Browse repository at this point
Copy the full SHA b699f54View commit details
Commits on Mar 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c1c45e9 - Browse repository at this point
Copy the full SHA c1c45e9View commit details -
Merge pull request hpcaitech#5452 from Courtesy-Xs/fix_include_path
fix include path
Configuration menu - View commit details
-
Copy full SHA for 6fd355a - Browse repository at this point
Copy the full SHA 6fd355aView commit details -
fix rmsnorm template function invocation problem(template function pa…
…rtial specialization is not allowed in Cpp) and luckily pass e2e precision test (hpcaitech#5454)
Configuration menu - View commit details
-
Copy full SHA for ed431de - Browse repository at this point
Copy the full SHA ed431deView commit details -
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA…
… Kernel (hpcaitech#5418) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline
Configuration menu - View commit details
-
Copy full SHA for f366a5e - Browse repository at this point
Copy the full SHA f366a5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1821a6d - Browse repository at this point
Copy the full SHA 1821a6dView commit details
Commits on Mar 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ae24b4f - Browse repository at this point
Copy the full SHA ae24b4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for d02e257 - Browse repository at this point
Copy the full SHA d02e257View commit details -
Configuration menu - View commit details
-
Copy full SHA for 388e043 - Browse repository at this point
Copy the full SHA 388e043View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e30248 - Browse repository at this point
Copy the full SHA 6e30248View commit details
Commits on Mar 15, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5724b9e - Browse repository at this point
Copy the full SHA 5724b9eView commit details -
Merge pull request hpcaitech#5457 from Courtesy-Xs/ly_add_implementat…
…ion_for_launch_config add implementatino for GetGPULaunchConfig1D
Configuration menu - View commit details
-
Copy full SHA for b6e9785 - Browse repository at this point
Copy the full SHA b6e9785View commit details
Commits on Mar 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 48c4f29 - Browse repository at this point
Copy the full SHA 48c4f29View commit details -
Configuration menu - View commit details
-
Copy full SHA for aabc9fb - Browse repository at this point
Copy the full SHA aabc9fbView commit details -
Merge pull request hpcaitech#5469 from Courtesy-Xs/add_vec_traits
Refactor vector utils
Configuration menu - View commit details
-
Copy full SHA for b96557b - Browse repository at this point
Copy the full SHA b96557bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ff42cc - Browse repository at this point
Copy the full SHA 7ff42ccView commit details
Commits on Mar 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 4eafe0c - Browse repository at this point
Copy the full SHA 4eafe0cView commit details -
Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech…
…/ColossalAI into colossal-infer-cuda-graph
Configuration menu - View commit details
-
Copy full SHA for 606603b - Browse repository at this point
Copy the full SHA 606603bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b017d6 - Browse repository at this point
Copy the full SHA 5b017d6View commit details
Commits on Mar 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9fe61b4 - Browse repository at this point
Copy the full SHA 9fe61b4View commit details -
Configuration menu - View commit details
-
Copy full SHA for ff4998c - Browse repository at this point
Copy the full SHA ff4998cView commit details -
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision…
… Flag To Rotary Embedding (hpcaitech#5461) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32
Configuration menu - View commit details
-
Copy full SHA for 87079cf - Browse repository at this point
Copy the full SHA 87079cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 68e9396 - Browse repository at this point
Copy the full SHA 68e9396View commit details -
Merge pull request hpcaitech#5434 from LRY89757/colossal-infer-cuda-g…
…raph [feat] cuda graph support and refactor non-functional api
Configuration menu - View commit details
-
Copy full SHA for 1d62623 - Browse repository at this point
Copy the full SHA 1d62623View commit details -
[fix] PR hpcaitech#5354 (hpcaitech#5501)
* [fix] * [fix] * Update config.py docstring * [fix] docstring align * [fix] docstring align * [fix] docstring align
Configuration menu - View commit details
-
Copy full SHA for 6251d68 - Browse repository at this point
Copy the full SHA 6251d68View commit details
Commits on Mar 26, 2024
-
[Inference] Optimize request handler of llama (hpcaitech#5512)
* optimize request_handler * fix ways of writing
Configuration menu - View commit details
-
Copy full SHA for e6496dd - Browse repository at this point
Copy the full SHA e6496ddView commit details
Commits on Mar 28, 2024
-
The writing style of tail processing and the logic related to macro d…
…efinitions have been optimized. (hpcaitech#5519)
Configuration menu - View commit details
-
Copy full SHA for 934e31a - Browse repository at this point
Copy the full SHA 934e31aView commit details
Commits on Apr 1, 2024
-
[Inference/Kernel]Add get_cos_and_sin Kernel (hpcaitech#5528)
* Add get_cos_and_sin kernel * fix code comments * fix code typos * merge common codes of get_cos_and_sin kernel. * Fixed a typo * Changed 'asset allclose' to 'assert equal'.
Configuration menu - View commit details
-
Copy full SHA for 04aca9e - Browse repository at this point
Copy the full SHA 04aca9eView commit details -
[Inference] Add Reduce Utils (hpcaitech#5537)
* add reduce utils * add using to delele namespace prefix
Configuration menu - View commit details
-
Copy full SHA for a2878e3 - Browse repository at this point
Copy the full SHA a2878e3View commit details
Commits on Apr 2, 2024
-
[Fix/Inference] Remove unused and non-functional functions (hpcaitech…
…#5543) * [fix] remove unused func * rm non-functional partial
Configuration menu - View commit details
-
Copy full SHA for 4bb5d89 - Browse repository at this point
Copy the full SHA 4bb5d89View commit details
Commits on Apr 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 7ebdf48 - Browse repository at this point
Copy the full SHA 7ebdf48View commit details -
Configuration menu - View commit details
-
Copy full SHA for ed5ebd1 - Browse repository at this point
Copy the full SHA ed5ebd1View commit details -
Configuration menu - View commit details
-
Copy full SHA for ce9401a - Browse repository at this point
Copy the full SHA ce9401aView commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for d788175 - Browse repository at this point
Copy the full SHA d788175View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ca1d1c - Browse repository at this point
Copy the full SHA 7ca1d1cView commit details
Commits on Apr 9, 2024
-
Sync main to feature/colossal-infer
[Sync] Merge feature/colossal-infer with main
Configuration menu - View commit details
-
Copy full SHA for d56c963 - Browse repository at this point
Copy the full SHA d56c963View commit details
Commits on Apr 10, 2024
-
[Infer] Revise and Adapt Triton Kernels for Spec-Dec (hpcaitech#5401)
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (hpcaitech#5399) fix dependency in pytest * resolve conflicts for revising flash-attn * adapt kv cache copy kernel for spec-dec * fix seqlen-n kvcache copy kernel/tests * test kvcache copy - use torch.equal * add assertions * (trivial) comment out
Configuration menu - View commit details
-
Copy full SHA for d63c469 - Browse repository at this point
Copy the full SHA d63c469View commit details -
[Inference/SpecDec] Add Basic Drafter Model Container (hpcaitech#5405)
* [Infer/Fix] Fix Dependency in test - RMSNorm kernel (hpcaitech#5399) fix dependency in pytest * add drafter model container (basic ver)
Configuration menu - View commit details
-
Copy full SHA for 5a9b05f - Browse repository at this point
Copy the full SHA 5a9b05fView commit details -
[Inference/SpecDec] Add Speculative Decoding Implementation (hpcaitec…
…h#5423) * fix flash decoding mask during verification * add spec-dec * add test for spec-dec * revise drafter init * remove drafter sampling * retire past kv in drafter * (trivial) rename attrs * (trivial) rename arg * revise how we enable/disable spec-dec
Configuration menu - View commit details
-
Copy full SHA for a37f826 - Browse repository at this point
Copy the full SHA a37f826View commit details -
[SpecDec] Fix inputs for speculation and revise past KV trimming (hpc…
…aitech#5449) * fix drafter pastkv and usage of batch bucket
Configuration menu - View commit details
-
Copy full SHA for 912e24b - Browse repository at this point
Copy the full SHA 912e24bView commit details -
[Inference/SpecDec] Support GLIDE Drafter Model (hpcaitech#5455)
* add glide-llama policy and modeling * update glide modeling, compitable with transformers 4.36.2 * revise glide llama modeling/usage * fix issues of glimpsing large kv * revise the way re-loading params for glide drafter * fix drafter and engine tests * enable convert to glide strict=False * revise glide llama modeling * revise vicuna prompt template * revise drafter and tests * apply usage of glide model in engine
Configuration menu - View commit details
-
Copy full SHA for d85d914 - Browse repository at this point
Copy the full SHA d85d914View commit details -
[doc] Add inference/speculative-decoding README (hpcaitech#5552)
* add README for spec-dec * update roadmap
Configuration menu - View commit details
-
Copy full SHA for e1acb58 - Browse repository at this point
Copy the full SHA e1acb58View commit details -
[Fix] resolve conflicts of rebasing feat/speculative-decoding (hpcait…
…ech#5557) - resolve conflicts of rebasing feat/speculative-decoding
Configuration menu - View commit details
-
Copy full SHA for e60d430 - Browse repository at this point
Copy the full SHA e60d430View commit details -
[Fix] Llama Modeling Control with Spec-Dec (hpcaitech#5580)
- fix ref before asgmt - fall back to use triton kernels when using spec-dec
Configuration menu - View commit details
-
Copy full SHA for f8598e3 - Browse repository at this point
Copy the full SHA f8598e3View commit details -
[Inference/Spec-Dec] Merge pull request hpcaitech#5565 from hpcaitech…
…/feat/speculative-decoding Add Speculative Decoding and GLIDE Spec-Dec
Configuration menu - View commit details
-
Copy full SHA for 25928d8 - Browse repository at this point
Copy the full SHA 25928d8View commit details
Commits on Apr 11, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a219123 - Browse repository at this point
Copy the full SHA a219123View commit details
Commits on Apr 15, 2024
-
[Inference/Refactor] Delete Duplicated code and refactor vec_copy uti…
…ls and reduce utils (hpcaitech#5593) * delete duplicated code and refactor vec_copy utils and reduce utils * delete unused header file
Configuration menu - View commit details
-
Copy full SHA for d4cb023 - Browse repository at this point
Copy the full SHA d4cb023View commit details -
[inference/model]Adapted to the baichuan2-7B model (hpcaitech#5591)
* Adapted to the baichuan2-7B model * modified according to the review comments. * Modified the method of obtaining random weights. * modified according to the review comments. * change mlp layewr 'NOTE'
Configuration menu - View commit details
-
Copy full SHA for 56b222e - Browse repository at this point
Copy the full SHA 56b222eView commit details
Commits on Apr 18, 2024
-
[Inference/Kernel] Add Paged Decoding kernel, sequence split within t…
…he same thread block (hpcaitech#5531) * feat flash decoding for paged attention * refactor flashdecodingattention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for be396ad - Browse repository at this point
Copy the full SHA be396adView commit details -
[Feat]Tensor Model Parallel Support For Inference (hpcaitech#5563)
* tensor parallel support naive source * [fix]precision, model load and refactor the framework * add tp unit test * docstring * fix do_sample
Configuration menu - View commit details
-
Copy full SHA for e37ee2f - Browse repository at this point
Copy the full SHA e37ee2fView commit details
Commits on Apr 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ccf7279 - Browse repository at this point
Copy the full SHA ccf7279View commit details
Commits on Apr 23, 2024
-
[Fix/Inference] Fix GQA Triton and Support Llama3 (hpcaitech#5624)
* [fix] GQA calling of flash decoding triton * fix kv cache alloc shape * fix rotary triton - GQA * fix sequence max length assigning * Sequence max length logic * fix scheduling and spec-dec * skip without import error * fix pytest - skip without ImportError --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 5d4c1fe - Browse repository at this point
Copy the full SHA 5d4c1feView commit details -
[Fix/Inference]Fix CUDA Rotary Rmbedding GQA (hpcaitech#5623)
* fix rotary embedding GQA * change test_rotary_embdding_unpad.py KH
Configuration menu - View commit details
-
Copy full SHA for 12f10d5 - Browse repository at this point
Copy the full SHA 12f10d5View commit details -
[example] Update Llama Inference example (hpcaitech#5629)
* [example] add infernece benchmark llama3 * revise inference config - arg * remove unused args * add llama generation demo script * fix init rope in llama policy * add benchmark-llama3 - cleanup
Configuration menu - View commit details
-
Copy full SHA for 04863a9 - Browse repository at this point
Copy the full SHA 04863a9View commit details
Commits on Apr 24, 2024
-
[Inference/Refactor] Refactor compilation mechanism and unified multi…
… hw (hpcaitech#5613) * refactor compilation mechanism and unified multi hw * fix file path bug * add init.py to make pybind a module to avoid relative path error caused by softlink * delete duplicated micros * fix micros bug in gcc
Configuration menu - View commit details
-
Copy full SHA for 279300d - Browse repository at this point
Copy the full SHA 279300dView commit details -
[Fix/Inference]Fix vllm benchmark (hpcaitech#5630)
* Fix bugs about OOM when running vllm-0.4.0 * rm used params * change generation_config * change benchmark log file name
Configuration menu - View commit details
-
Copy full SHA for 90cd522 - Browse repository at this point
Copy the full SHA 90cd522View commit details
Commits on Apr 25, 2024
-
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (
hpcaitech#5643) * optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x]) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for a8fd3b0 - Browse repository at this point
Copy the full SHA a8fd3b0View commit details -
Configuration menu - View commit details
-
Copy full SHA for f342a93 - Browse repository at this point
Copy the full SHA f342a93View commit details -
[Inference]Adapt to baichuan2 13B (hpcaitech#5614)
* adapt to baichuan2 13B * adapt to baichuan2 13B * change BAICHUAN_MODEL_NAME_OR_PATH * fix test_decoding_attn.py * Modifications based on review comments. * change BAICHUAN_MODEL_NAME_OR_PATH * mv attn mask processes to test flash decoding * mv get_alibi_slopes baichuan modeling * fix bugs in test_baichuan.py
Configuration menu - View commit details
-
Copy full SHA for 3c91e3f - Browse repository at this point
Copy the full SHA 3c91e3fView commit details
Commits on Apr 26, 2024
-
[kernel] Support new KCache Layout - Context Attention Triton Kernel (h…
…pcaitech#5658) * add context attn triton kernel - new kcache layout * add benchmark triton * tiny revise * trivial - code style, comment
Configuration menu - View commit details
-
Copy full SHA for 5be590b - Browse repository at this point
Copy the full SHA 5be590bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ccb671 - Browse repository at this point
Copy the full SHA 8ccb671View commit details
Commits on Apr 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 808ee6e - Browse repository at this point
Copy the full SHA 808ee6eView commit details -
[Inference] Adapt Baichuan2-13B TP (hpcaitech#5659)
* adapt to baichuan2 13B * add baichuan2 13B TP * update baichuan tp logic * rm unused code * Fix TP logic * fix alibi slopes tp logic * rm nn.Module * Polished the code. * change BAICHUAN_MODEL_NAME_OR_PATH * Modified the logic for loading Baichuan weights. * fix typos
Configuration menu - View commit details
-
Copy full SHA for 5f00002 - Browse repository at this point
Copy the full SHA 5f00002View commit details -
[Inference/Kernel] refactor kvcache manager and rotary_embedding and …
…kvcache_memcpy oper… (hpcaitech#5663) * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator * refactor decode_kv_cache_memcpy * enable alibi in pagedattention
Configuration menu - View commit details
-
Copy full SHA for 5cd75ce - Browse repository at this point
Copy the full SHA 5cd75ceView commit details -
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding…
…_cache_copy (hpcaitech#5680)
Configuration menu - View commit details
-
Copy full SHA for ef8e4ff - Browse repository at this point
Copy the full SHA ef8e4ffView commit details -
[inference]Add alibi to flash attn function (hpcaitech#5678)
* add alibi to flash attn function * rm redundant modifications
Configuration menu - View commit details
-
Copy full SHA for f799631 - Browse repository at this point
Copy the full SHA f799631View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9df016f - Browse repository at this point
Copy the full SHA 9df016fView commit details
Commits on May 3, 2024
-
[kernel] Support New KCache Layout - Triton Kernel (hpcaitech#5677)
* kvmemcpy triton for new kcache layout * revise tests for new kcache layout * naive triton flash decoding - new kcache layout * rotary triton kernel - new kcache layout * remove redundancy - triton decoding * remove redundancy - triton kvcache copy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 537a3cb - Browse repository at this point
Copy the full SHA 537a3cbView commit details
Commits on May 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 56ed09a - Browse repository at this point
Copy the full SHA 56ed09aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8754aba - Browse repository at this point
Copy the full SHA 8754abaView commit details
Commits on May 6, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 725fbd2 - Browse repository at this point
Copy the full SHA 725fbd2View commit details -
[Sync] Update from main to feature/colossal-infer (Merge pull request h…
…pcaitech#5685) [Sync] Update from main to feature/colossal-infer - Merge pull request hpcaitech#5685 from yuanheng-zhao/inference/merge/main
Configuration menu - View commit details
-
Copy full SHA for db7b305 - Browse repository at this point
Copy the full SHA db7b305View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1ace106 - Browse repository at this point
Copy the full SHA 1ace106View commit details
Commits on May 7, 2024
-
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (hpcaitech#…
…5695) - Fix key value number assignment in KVCacheManager, as well as method of accessing
Configuration menu - View commit details
-
Copy full SHA for f9afe0a - Browse repository at this point
Copy the full SHA f9afe0aView commit details
Commits on May 8, 2024
-
[Fix] Fix Inference Example, Tests, and Requirements (hpcaitech#5688)
* clean requirements * modify example inference struct * add test ci scripts * mark test_infer as submodule * rm deprecated cls & deps * import of HAS_FLASH_ATTN * prune inference tests to be run * prune triton kernel tests * increment pytest timeout mins * revert import path in openmoe
Configuration menu - View commit details
-
Copy full SHA for 55cc7f3 - Browse repository at this point
Copy the full SHA 55cc7f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 12e7c28 - Browse repository at this point
Copy the full SHA 12e7c28View commit details -
[Inference]Adapt temperature processing logic (hpcaitech#5689)
* Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg
Configuration menu - View commit details
-
Copy full SHA for 9c2fe79 - Browse repository at this point
Copy the full SHA 9c2fe79View commit details -
[Inference] Support the logic related to ignoring EOS token (hpcaitec…
…h#5693) * Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg * support ignore EOS token * change variable's name * fix annotation
Configuration menu - View commit details
-
Copy full SHA for d482922 - Browse repository at this point
Copy the full SHA d482922View commit details -
[Inference] ADD async and sync Api server using FastAPI (hpcaitech#5396)
* add api server * fix * add * add completion service and fix bug * add generation config * revise shardformer * fix bugs * add docstrings and fix some bugs * fix bugs and add choices for prompt template
Configuration menu - View commit details
-
Copy full SHA for 69cd7e0 - Browse repository at this point
Copy the full SHA 69cd7e0View commit details -
[Inference] Finish Online Serving Test, add streaming output api, con…
…tinuous batching test and example (hpcaitech#5432) * finish online test and add examples * fix test_contionus_batching * fix some bugs * fix bash * fix * fix inference * finish revision * fix typos * revision
Configuration menu - View commit details
-
Copy full SHA for de378cd - Browse repository at this point
Copy the full SHA de378cdView commit details -
[Online Server] Chat Api for streaming and not streaming response (hp…
…caitech#5470) * fix bugs * fix bugs * fix api server * fix api server * add chat api and test * del request.n
Configuration menu - View commit details
-
Copy full SHA for c064032 - Browse repository at this point
Copy the full SHA c064032View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7bbb28e - Browse repository at this point
Copy the full SHA 7bbb28eView commit details -
[Inference] Fix bugs and docs for feat/online-server (hpcaitech#5598)
* fix test bugs * add do sample test * del useless lines * fix comments * fix tests * delete version tag * delete version tag * add * del test sever * fix test * fix * Revert "add" This reverts commit b9305fb.
Configuration menu - View commit details
-
Copy full SHA for 61a1b2e - Browse repository at this point
Copy the full SHA 61a1b2eView commit details -
Configuration menu - View commit details
-
Copy full SHA for bc9063a - Browse repository at this point
Copy the full SHA bc9063aView commit details
Commits on May 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5d9a494 - Browse repository at this point
Copy the full SHA 5d9a494View commit details -
Merge pull request hpcaitech#5588 from hpcaitech/feat/online-serving
[Feature]Online Serving
Configuration menu - View commit details
-
Copy full SHA for 492520d - Browse repository at this point
Copy the full SHA 492520dView commit details -
[Inference/Feat] Add quant kvcache interface (hpcaitech#5700)
* add quant kvcache interface * delete unused output * complete args comments
Configuration menu - View commit details
-
Copy full SHA for bfad393 - Browse repository at this point
Copy the full SHA bfad393View commit details
Commits on May 10, 2024
-
[Inference/Feat] Add convert_fp8 op for fp8 test in the future (hpcai…
…tech#5706) * add convert_fp8 op for fp8 test in the future * rerun ci
Configuration menu - View commit details
-
Copy full SHA for 50104ab - Browse repository at this point
Copy the full SHA 50104abView commit details
Commits on May 11, 2024
-
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (hpcaite…
…ch#5708) * Adapt repetition_penalty and no_repeat_ngram_size * fix no_repeat_ngram_size_logit_process * remove batch_updated * fix annotation * modified codes based on the review feedback. * rm get_batch_token_ids
Configuration menu - View commit details
-
Copy full SHA for de4bf3d - Browse repository at this point
Copy the full SHA de4bf3dView commit details
Commits on May 14, 2024
-
[Feat]Inference RPC Server Support (hpcaitech#5705)
* rpc support source * kv cache logical/physical disaggregation * sampler refactor * colossalai launch built in * Unitest * Rpyc support --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 18d67d0 - Browse repository at this point
Copy the full SHA 18d67d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 30ea54f - Browse repository at this point
Copy the full SHA 30ea54fView commit details