[Deep Learning] SGD (Stochastic Gradient Descent)

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Notice

※ 수식 깨짐 공지사항

Recent Posts

Today

Total

Tags more

관리 메뉴

데이터 분석 일지

[Deep Learning] SGD (Stochastic Gradient Descent) 본문

Lecture or Textbook Review/Deep Learning

[Deep Learning] SGD (Stochastic Gradient Descent)

-ˋˏ ♡ ˎˊ- 2024. 1. 31. 23:09

0. Currently, what we do

모든 sample에 대해서 gradient descent를 통해 1개의 파라미터를 업데이트할 때 드는 비용은 효율적이지 못하다. sample이 많으면 많을 수록 모든 파라미터를 미분하는 것이 1번의 업데이트이기 때문이다. 따라서 이를 효율적인 방법으로 gradient descent를 구하기 위하여 SGD를 사용한다.

1. SGD (Stochastic Gradient Descent)

1st update from random k sample loss
2nd update from another random k sample loss

2. Epoch & Iteration

1 Epoch: 모든 N개의 데이터셋의 샘플들이 forward & backward 되는 시점이다. Epoch의 시작에 데이터셋을 random-shuffling 해준 후, 미니배치로 k개 씩 나눈다.
1 Iteration: 한 개의 미니배치 샘플들이 forward & backward 되는 시점이다. (= N/k)
Epoch과 Iteration의 이중의 for loop가 생성됨 → 파라미터 전체 업데이트 횟수 = epochs × iteration

3. SGD Summary

전체 샘플의 loss에 대한 gradient descent가 아닌, 일부 샘플의 loss에 대한 gradient descent
1 epoch (전체 데이터셋을 활용한 학습) 당 파라미터 update 수 ↑ (= 1 epoch 소요시간 ↑)
batch-size가 작아질 수록 실제 gradient descent 값과 방향이 달라짐
batch-size는 2의 n승을 활용
batch-size가 작을 경우: local minima 탈출 확률 ↑
batch-size가 클 경우: 상대적으로 더 빠르게 수렴

'Lecture or Textbook Review > Deep Learning' 카테고리의 다른 글

[Deep Learning] Overfitting (과적합) (0)	2024.02.03
[Deep Learning] Hyper-Parameter & Optimizer (0)	2024.02.02
[Deep Learning] Regression (0)	2024.01.31
[Deep Learning] Logistic Regression (로지스틱 회귀) (0)	2024.01.23
[Deep Learning] Linear Regression (선형 회귀) (0)	2024.01.23

'Lecture or Textbook Review/Deep Learning' Related Articles

데이터 분석 일지

[Deep Learning] SGD (Stochastic Gradient Descent) 본문

[Deep Learning] SGD (Stochastic Gradient Descent)

'Lecture or Textbook Review > Deep Learning' 카테고리의 다른 글

티스토리툴바