microsoft / DeepSpeed Public

Notifications You must be signed in to change notification settings
Fork 3.9k
Star 33.4k

Code
Issues 977
Pull requests 144
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: microsoft/DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

977 Open 1,632 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

How to set different learning rates for different parameters of LLMs

#5665 opened Jun 15, 2024 by jpWang

AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.

#5663 opened Jun 14, 2024 by nitinmukesh

[BUG] 'Invalidate trace cache' with Seq2SeqTrainer+predict_with_generate+Zero3 bug

Something isn't working

inference

#5662 opened Jun 14, 2024 by Osterlohe

does DeepSpeed support AMSP (a new DP shard strategy) enhancement

New feature or request

#5661 opened Jun 14, 2024 by guoyejun

Fail to use zero_init to construct llama2 with deepspeed zero3 and bnb!

#5660 opened Jun 14, 2024 by CHNRyan

RuntimeError: Error building extension 'cpu_adam', because /usr/bin/ld: can not find -lcurand，help!

#5659 opened Jun 14, 2024 by hekaijie123

[BUG] Running llama2-7b step3 with tensor parallel and HE fails due to incompatible shapes bug

Something isn't working

deepspeed-chat

Related to DeepSpeed-Chat

#5656 opened Jun 13, 2024 by ShellyNR

[BUG] oneapi/ccl.hpp: No such file or directory. bug

Something isn't working

training

#5653 opened Jun 12, 2024 by weiji14

RuntimeError: still have inflight params[BUG] bug

Something isn't working

training

#5648 opened Jun 12, 2024 by iszengxin

Inference with the MoE based GPT model trained by ds_pretrain_gpt_345M_MoE128.sh [BUG] bug

Something isn't working

inference

#5647 opened Jun 12, 2024 by haoranlll

[BUG] File not found in autotuner cache in multi-node setting on SLURM bug

Something isn't working

training

#5646 opened Jun 12, 2024 by jubueche

Why doesn't deepspeed stage 3 allow a batch size of 1 with multiple GPUs? bug

Something isn't working

training

#5645 opened Jun 12, 2024 by AceMcAwesome77

[BUG] RuntimeError encountered when generating tokens from a Meta-Llama-3-8B-Instruct model initialized with 4-bit or 8-bit quantization bug

Something isn't working

compression

#5644 opened Jun 11, 2024 by Atry

[BUG] 1 line logic issue: flipped sign/direction in _partition_param_sec of partition_parameters.py? bug

Something isn't working

training

#5642 opened Jun 11, 2024 by dukleryoni

[BUG] tortoise_tts.py fails on deepspeed/pydantic error bug

Something isn't working

inference

#5641 opened Jun 11, 2024 by tholonia

Does deepspeed support aarch64?

#5640 opened Jun 11, 2024 by khayamgondal

[HELP] How to safely switch trainable parameters in ZeRO-3 stage?

#5639 opened Jun 11, 2024 by Ledzy

Deepspeed zero3 + qlora arise problem! Params didn't sharded first before load to each GPU!

#5637 opened Jun 11, 2024 by CHNRyan

[BUG] 4-bit quantized models would repeatedly generate the same tokens when bf16.enabled is true bug

Something isn't working

compression

#5636 opened Jun 10, 2024 by Atry

Deepspeed stage 3 hanging after 1st validation sample bug

Something isn't working

training

#5635 opened Jun 10, 2024 by AceMcAwesome77

[BUG] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! bug

Something isn't working

training

#5634 opened Jun 10, 2024 by fahadh4ilyas

[BUG] is_zero_init_model is always False when I'm using zero_init! bug

Something isn't working

training

#5631 opened Jun 8, 2024 by CHNRyan

[BUG] RuntimeError encountered when generating tokens from a DeepSpeedHybridEngine initialized with 4-bit quantization. bug

Something isn't working

deepspeed-chat

Related to DeepSpeed-Chat

#5630 opened Jun 8, 2024 by Atry

[BUG] 1: error: must run as root and 2: raise RuntimeError("Ninja is required to load C++ extensions") bug

Something isn't working

training

#5627 opened Jun 7, 2024 by YangBrooksHan

[BUG] RuntimeError: Error building extension 'fused_adam' Loading extension module fused_adam bug

Something isn't working

compression

#5623 opened Jun 6, 2024 by JinQiangWang2021

Previous 1 2 3 4 5 … 39 40 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-06-12.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly