-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How to set different learning rates for different parameters of LLMs
#5665
opened Jun 15, 2024 by
jpWang
[BUG] 'Invalidate trace cache' with Seq2SeqTrainer+predict_with_generate+Zero3
bug
Something isn't working
inference
#5662
opened Jun 14, 2024 by
Osterlohe
does DeepSpeed support AMSP (a new DP shard strategy)
enhancement
New feature or request
#5661
opened Jun 14, 2024 by
guoyejun
Fail to use zero_init to construct llama2 with deepspeed zero3 and bnb!
#5660
opened Jun 14, 2024 by
CHNRyan
RuntimeError: Error building extension 'cpu_adam', because /usr/bin/ld: can not find -lcurand,help!
#5659
opened Jun 14, 2024 by
hekaijie123
[BUG] Running llama2-7b step3 with tensor parallel and HE fails due to incompatible shapes
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#5656
opened Jun 13, 2024 by
ShellyNR
[BUG] oneapi/ccl.hpp: No such file or directory.
bug
Something isn't working
training
#5653
opened Jun 12, 2024 by
weiji14
RuntimeError: still have inflight params[BUG]
bug
Something isn't working
training
#5648
opened Jun 12, 2024 by
iszengxin
Inference with the MoE based GPT model trained by ds_pretrain_gpt_345M_MoE128.sh [BUG]
bug
Something isn't working
inference
#5647
opened Jun 12, 2024 by
haoranlll
[BUG] File not found in autotuner cache in multi-node setting on SLURM
bug
Something isn't working
training
#5646
opened Jun 12, 2024 by
jubueche
Why doesn't deepspeed stage 3 allow a batch size of 1 with multiple GPUs?
bug
Something isn't working
training
#5645
opened Jun 12, 2024 by
AceMcAwesome77
[BUG] RuntimeError encountered when generating tokens from a Meta-Llama-3-8B-Instruct model initialized with 4-bit or 8-bit quantization
bug
Something isn't working
compression
#5644
opened Jun 11, 2024 by
Atry
[BUG] 1 line logic issue: flipped sign/direction in Something isn't working
training
_partition_param_sec
of partition_parameters.py
?
bug
#5642
opened Jun 11, 2024 by
dukleryoni
[BUG] tortoise_tts.py fails on deepspeed/pydantic error
bug
Something isn't working
inference
#5641
opened Jun 11, 2024 by
tholonia
[HELP] How to safely switch trainable parameters in ZeRO-3 stage?
#5639
opened Jun 11, 2024 by
Ledzy
Deepspeed zero3 + qlora arise problem! Params didn't sharded first before load to each GPU!
#5637
opened Jun 11, 2024 by
CHNRyan
[BUG] 4-bit quantized models would repeatedly generate the same tokens when bf16.enabled is true
bug
Something isn't working
compression
#5636
opened Jun 10, 2024 by
Atry
Deepspeed stage 3 hanging after 1st validation sample
bug
Something isn't working
training
#5635
opened Jun 10, 2024 by
AceMcAwesome77
[BUG] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
bug
Something isn't working
training
#5634
opened Jun 10, 2024 by
fahadh4ilyas
[BUG] is_zero_init_model is always False when I'm using zero_init!
bug
Something isn't working
training
#5631
opened Jun 8, 2024 by
CHNRyan
[BUG] RuntimeError encountered when generating tokens from a DeepSpeedHybridEngine initialized with 4-bit quantization.
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#5630
opened Jun 8, 2024 by
Atry
[BUG] 1: error: must run as root and 2: raise RuntimeError("Ninja is required to load C++ extensions")
bug
Something isn't working
training
#5627
opened Jun 7, 2024 by
YangBrooksHan
[BUG] RuntimeError: Error building extension 'fused_adam' Loading extension module fused_adam
bug
Something isn't working
compression
#5623
opened Jun 6, 2024 by
JinQiangWang2021
Previous Next
ProTip!
Updated in the last three days: updated:>2024-06-12.