PyTorch Concepts DeepDive

PyTorch(파이토치)는 오늘날 가장 중요하고 인기 있는 딥러닝 프레임워크 중 하나입니다. Meta(구 페이스북)에서 Lua 기반의 Torch 라이브러리를 기반으로 구축하여 2017년에 오픈 소스로 공개했습니다.

출시 이후 Tesla의 자율 주행 자동차부터 OpenAI의 ChatGPT에 이르기까지, 현대 AI 혁신의 거의 모든 곳에 사용되고 있습니다.

PyTorch를 제대로 이해하기 위해 꼭 알아야 할 20가지 핵심 개념을 정리해 보았습니다.

1. 텐서 (Tensor)

텐서는 PyTorch의 핵심 데이터 구조이자 기본 연산 단위입니다. 데이터 타입이 같은 값들을 담고 있는 다차원 배열로, NumPy 배열과 비슷하지만 GPU 가속에 최적화되어 있다는 점이 다릅니다.

리스트로 텐서 생성하기

import torch

# Create tensor from a list
x = tensor.tensor([1,2,3])

print(x)
#tensor([1, 2, 3])

다양한 초기화 방법:

# 초기화되지 않은 3x2 텐서 생성
x = torch.empty(3, 2)

# 0으로 채워진 3x2 텐서 생성
x = torch.zeros(3, 2)

# 1로 채워진 3x2 텐서 생성
x = torch.ones(3, 2)

# 0과 1 사이의 균등 분포(uniform distribution)에서 랜덤 값 생성
x = torch.rand(2, 2)

# 평균 0, 분산 1인 표준 정규 분포(normal distribution)에서 랜덤 값 생성
x = torch.randn(2, 2)

# 특정 구간 내에서 일정한 간격의 값을 가진 1D 텐서 생성 (start, end, step)
x = torch.arange(0, 10, 2) 
# tensor([0, 2, 4, 6, 8])

2. 텐서 산술 연산 (Arithmetic operations)

텐서 간의 요소별(element-wise) 사칙연산은 매우 직관적입니다.

x = torch.tensor([3, 2, 1])
y = torch.tensor([1, 2, 3])

# 요소별 산술 연산
add = x + y         # 덧셈
subtract = x - y    # 뺄셈
multiply = x * y    # 곱셈
divide = x / y      # 나눗셈

In-place 연산 (덮어쓰기): 연산 메서드 뒤에 _가 붙으면 원본 텐서의 값을 직접 수정합니다. 메모리를 아낄 때 유용합니다.

x = torch.tensor([10, 20, 30], dtype=torch.float32)

x.add_(2)   # x에 2를 더하고 x에 저장
x.sub_(3)    
x.mul_(5)   
x.div_(2)

3. 브로드캐스팅 (Broadcasting)

브로드캐스팅은 모양(shape)이 다른 텐서끼리 연산할 때, PyTorch가 자동으로 작은 텐서를 큰 텐서 크기에 맞춰 확장해 주는 기능입니다. 실제로 데이터를 복사하지 않으므로 효율적입니다.

# 브로드캐스팅 없이 (수동)
a = torch.tensor([1, 2, 3])
b = torch.tensor([10])
b_repeated = b.repeat(3) # [10, 10, 10]
result = a + b_repeated  # [11, 12, 13]

# 브로드캐스팅 사용 (자동)
a = torch.tensor([1, 2, 3])
b = torch.tensor([10])
result = a + b  # [11, 12, 13] - b가 자동으로 [10, 10, 10]처럼 동작

4. 텐서 형태 변경 (Reshaping)

가장 많이 쓰이는 메서드는 reshape와 view입니다.

view: 원본 텐서가 메모리 상에서 연속적(contiguous)이어야만 작동합니다.

x = torch.randn(2, 3, 4)
y = x.view(6, 4) # 형태 변경

# 전치(Transpose) 후에는 메모리 연속성이 깨짐
x_t = x.transpose(1, 2)
# y_2 = x_t.view(6, 4) # 에러 발생 가능
y_2 = x_t.reshape(6, 4) # 안전하게 작동

reshape: 메모리가 연속적이지 않아도 작동합니다 (필요하면 복사본 생성).

# Reshape using 'reshape'
y_2 = x_t.reshape(6, 4)

unsqueeze(dim): 크기가 1인 차원을 추가합니다. squeeze(dim): 크기가 1인 차원을 제거합니다.

# Create tensor of shape (3, 4)
x = torch.randn(3, 4)

# Add a new dimension at the front (dim=0)
x_unsq0 = x.unsqueeze(0)
# shape: (1, 3, 4)

# Add a new dimension in the middle (dim=1)
x_unsq1 = x.unsqueeze(1)
# shape: (3, 1, 4)

# Add a new dimension at the end (dim=2)
x_unsq2 = x.unsqueeze(2)
# shape: (3, 4, 1)

# Create a tensor
x = torch.randn(1, 3, 1, 4, 1)

# Remove all size-1 dimensions
x_sq_all = x.squeeze()
# shape: (3, 4)

# Remove only a specific dimension of size 1
x_sq0 = x.squeeze(0)   # remove dim 0 → shape: (3, 1, 4, 1)
x_sq2 = x.squeeze(2)   # remove dim 2 → shape: (1, 3, 4, 1)

flatten(): 텐서를 1차원으로 평탄화합니다.

# Create a tensor
x = torch.randn(2, 3, 4)

# Flatten into 1-D
y = x.flatten() 
# shape: (24,)

# Flatten into 1-D starting from dim 1
y = x.flatten(start_dim=1)   # shape: (2, 12)

# Create a 1-D tensor
x = torch.arange(24)
# shape: (24,)

# Unflatten the first dimension into (2, 3, 4)
y = torch.unflatten(x, dim=0, sizes=(2, 3, 4))
# shape: (2, 3, 4)

transpose(dim0, dim1): 두 차원을 맞바꿉니다.

# Create a tensor
x = torch.randn(2, 3, 4) # shape: (2, 3, 4)

# Transpose dim 1 and dim 2
y = x.transpose(1, 2) # shape: (2, 4, 3)

# Create a 2D tensor
m = torch.randn(3, 4) # shape: (3, 4)

# Shorthand for transpose applicable only to 2D tensors
m_t = m.t() # shape: (4, 3)

permute(*dims): 모든 차원의 순서를 재배치합니다.

# Use 'permute' to reorder all dims as required 
y = x.permute(2, 0, 1)    # shape: (4, 2, 3)

expand broadcasts a tensor to a larger size without copying data, while repeat does so by copying the values.

# Create a tensor
x = torch.randn(1, 3)

y = x.expand(4, 3) # # x's single row is broadcasted 4 times
# shape: (4, 3)

#'repeat' works by repeating each dimension by the given factor
y = x.repeat(4, 1) # x's dim 0 is duplicated 4 times in memory, dim 1 is duplicated once
# shape: (4, 3)

stack combines tensors by creating a new dimension, while cat joins or concatenates tensors along an existing dimension.

# Create tensors
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Stack along a new dimension 0
out = torch.stack([a, b], dim=0)

print(out.shape)
# torch.Size([2, 3])

# Concatenate along dimension 0
out = torch.cat([a, b], dim=0)

print(out.shape)
# torch.Size([6])

5. Autograd (자동 미분)

Autograd는 딥러닝의 핵심인 자동 미분 엔진입니다. 신경망 학습 시 손실 함수(Loss function)의 기울기(Gradient)를 계산하는 데 사용됩니다.

requires_grad=True로 설정하면 해당 텐서의 모든 연산이 추적됩니다.
연산은 DAG(방향성 비순환 그래프) 형태로 기록됩니다.
.backward()를 호출하면 역전파(Backpropagation)가 수행되어 미분값($dy/dx$)이 계산됩니다.

# 기울기가 필요한 텐서 생성
x = torch.tensor(2.0, requires_grad=True)

# 함수 정의 (y = x^2 + 2x + 2)
y = x**2 + 2*x + 2

# 미분 계산 (dy/dx = 2x + 2)
y.backward()

# x=2일 때의 미분값 출력 -> 2*2 + 2 = 6
print(x.grad) # tensor(6.)

팁: torch.no_grad() 블록 안에서는 기울기 계산을 하지 않아 메모리를 절약하고 속도를 높일 수 있습니다 (주로 추론 시 사용).

6. 신경망 생성 (nn.Module)

PyTorch의 모든 신경망 모듈의 기본 클래스는 nn.Module입니다.

__init__: 레이어를 정의합니다.
forward: 입력 데이터가 어떻게 처리되어 출력되는지 정의합니다.

import torch.nn as nn 

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 3)

    def forward(self, x):
        return self.linear(x)

7. 선형 레이어 (Linear Layers)

완전 연결 계층(Fully Connected Layer)이라고도 하며, 입력의 선형 변환을 수행합니다 (y = xA^T + b).

# 입력 10개, 출력 5개인 선형 레이어
linear = nn.Linear(in_features=10, out_features=5)
x = torch.randn(32, 10) 
output = linear(x) # 결과 shape: (32, 5)

8. 활성화 함수 (Activation Functions)

모델에 비선형성을 도입하여 복잡한 패턴을 학습할 수 있게 해줍니다.

x = torch.randn(10)

# 자주 쓰이는 활성화 함수들
relu = nn.ReLU()(x)         # 음수를 0으로 만듦
sigmoid = torch.sigmoid(x)  # 0과 1 사이로 압축
tanh = torch.tanh(x)        # -1과 1 사이로 압축

9. 손실 함수 (Loss Functions)

모델의 예측이 실제 정답과 얼마나 다른지 측정합니다.

회귀(Regression): nn.MSELoss (평균 제곱 오차), nn.L1Loss
분류(Classification): nn.CrossEntropyLoss (다중 클래스), nn.BCELoss (이진 분류)

# Mean Squared Error: mean of (pred - target)^2
criterion = nn.MSELoss()

# Mean Absolute Error (L1 Loss): mean of |pred - target|
criterion = nn.L1Loss()

# Calculate loss from prediction and target values 

pred = torch.tensor([2.5, 3.0, 4.5])
target = torch.tensor([3.0, 3.0, 5.0])
loss = criterion(pred, target)

# Cross Entropy Loss (for multi-class classification)
criterion = nn.CrossEntropyLoss()

# Calculate loss from prediction and target values
pred = torch.tensor([[1.2, 0.3, -0.5]]) # 1 sample, 3 classes
target = torch.tensor([0]) # correct class index is 0
loss = criterion(pred, target)

# Binary Cross Entropy Loss (for binary classification)
criterion = nn.BCELoss()

# Calculate loss from prediction and target values
pred = torch.tensor([0.7, 0.2, 0.9]) # probabilities after sigmoid
target = torch.tensor([1., 0., 1.]) # ground truth labels (0 or 1)
loss = criterion(pred, target)

10. 옵티마이저 (Optimizers)

계산된 기울기를 바탕으로 모델의 파라미터를 업데이트하여 손실을 줄이는 알고리즘입니다.

import torch.optim as optim

# Stochastic Gradient Descent optimizer with momentum
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Adam optimizer combining momentum and adaptive learning rates
optimizer = optim.Adam(model.parameters(), lr=0.001)

# RMSprop optimizer with adaptive learning rate based on recent gradients
optimizer = optim.RMSprop(model.parameters(), lr=0.001)

# Adadelta optimizer adapting learning rates without needing initial lr tuning
optimizer = optim.Adadelta(model.parameters(), lr=1.0)

# Adagrad optimizer with adapting learning rates
optimizer = optim.Adagrad(model.parameters(), lr=0.01)

# AdamW optimizer: Adam variant with decoupled weight decay regularization
optimizer = optim.AdamW(model.parameters(), lr=0.001)

11. 신경망 학습 루프 (Training Loop)

앞서 배운 개념들을 종합하여 모델을 학습시키는 과정입니다.

model = NeuralNet()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train for 100 epochs
for epoch in range(100):
    # Forward pass
    outputs = model(train_data)
    loss = criterion(outputs, targets)
    
    # Backward pass
    optimizer.zero_grad() # Clear old gradients
    loss.backward()
    optimizer.step()
    
    # Print loss every 10 epochs
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

주의: optimizer.zero_grad()를 호출하지 않으면 기울기가 계속 누적되어 학습이 엉뚱한 방향으로 진행됩니다.

12. Sequential 컨테이너

여러 레이어와 활성화 함수를 순차적으로 쌓아서 간편하게 모델을 정의할 때 사용합니다.

# Define a model using Sequential
model = nn.Sequential(
    nn.Linear(10, 50),
    nn.ReLU(),
    nn.Linear(50, 20),
    nn.ReLU(),
    nn.Linear(20, 1)
)

# Input
x = torch.randn(32, 10)

# Output
output = model(x)

13. 데이터 처리 (Dataset & DataLoader)

대량의 데이터를 효율적으로 처리하기 위한 도구입니다.

Dataset: 데이터 샘플 하나를 가져오는 방법을 정의합니다 (__getitem__, __len__).
DataLoader: Dataset을 감싸서 배치(batch) 처리, 셔플(shuffle), 멀티프로세싱 등을 제공합니다.

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels
    
    def __len__(self):
        # Return the total number of samples in the dataset
        return len(self.data)
    
    def __getitem__(self, idx):
        # Return a single sample and its label at a given index
        return self.data[idx], self.labels[idx]

from torch.utils.data import DataLoader

# Create dataset
dataset = CustomDataset(data, labels)

# Create dataloader
dataloader = DataLoader(
    dataset,
    batch_size=32, # Used to create sample batches
    shuffle=True,
    num_workers=4 # Allows multi-process data loading
)

# Training loop
for epoch in range(5):
    for batch_idx, (x_batch, y_batch) in enumerate(dataloader):
        optimizer.zero_grad()
        pred = model(x_batch)
        loss = criterion(pred, y_batch)
        loss.backward()
        optimizer.step()

14. 합성곱 레이어 (Convolutional Layers)

이미지 처리에 특화된 CNN(Convolutional Neural Networks)을 구축할 때 사용됩니다.

# 1D Convolutional layer
conv1d_layer = nn.Conv1d(
    in_channels=2, # Input channels
    out_channels=32, # Number of 1D filters/feature maps to learn
    kernel_size=5, # Size of the 1D filter (covers 5 time steps)
    stride=2, # Move filter by 2 time steps at a time
    padding=2 # Zero-padding 
)

# 2D Convolutional layer
conv_layer = nn.Conv2d(
    in_channels=3, # Input channels (3 for RGB images)
    out_channels=64, # Number of filters/feature maps to learn
    kernel_size=3,  # Size of the convolutional filter
    stride=1, # Move filter by 2 time steps at a time
    padding=1  # Zero-padding around border to preserve length 
)

# 3D Convolutional layer
conv3d_layer = nn.Conv3d(
    in_channels=1, # Input channels
    out_channels=16, # Number of 3D filters/feature volumes to learn
    kernel_size=3, # Size of the 3D filter (3×3×3 cube)
    stride=1, # Move filter by 1 voxel in each dimension
    padding=1 # Zero-padding on all 6 faces
)

15. 순환 레이어 (Recurrent Layers)

시계열 데이터, 텍스트 등 순서가 있는 데이터를 처리하는 RNN, LSTM, GRU 레이어입니다.

# RNN
rnn = nn.RNN(
    input_size=10, # Features per time step
    hidden_size=20, # Size of hidden state
    num_layers=2, # Stack multiple RNN layers
    # Shapes input/output tensors as (batch_size, sequence_length, features)
    batch_first=True 
)

# LSTM
lstm = nn.LSTM(
    input_size=10,
    hidden_size=20,
    num_layers=2,
    batch_first=True,
    dropout=0.2 # Dropout between LSTM layers
)

# GRU
gru = nn.GRU(
    input_size=10,
    hidden_size=20,
    num_layers=2,
    batch_first=True
)

16. 드롭아웃 (Dropout)

학습 중에 무작위로 일부 뉴런을 0으로 만들어(끄고) 과적합(Overfitting)을 방지하는 기법입니다.

dropout = nn.Dropout(p=0.2) # Drop 20% of neurons during training

# Input
x = torch.randn(32, 100)  

# Output
x_dropped = dropout(x)

17. 정규화 (Batch Normalization & Layer Normalization)

학습을 안정화하고 속도를 높이는 기법입니다.

BatchNorm: 배치의 평균과 분산을 이용해 정규화 (주로 CNN에서 사용).
LayerNorm: 각 샘플의 특징(feature) 차원에서 정규화 (주로 RNN, Transformer에서 사용).

# Neural network with Batch Norm 
class ModelWithBatchNorm(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.bn1 = nn.BatchNorm1d(256) 
        self.fc2 = nn.Linear(256, 10)
    
    def forward(self, x):
        x = F.relu(self.bn1(self.fc1(x)))
        return self.fc2(x)

# Neural network with Layer Norm
class ModelWithLayerNorm(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.ln1 = nn.LayerNorm(256) 
        self.fc2 = nn.Linear(256, 10)
    
    def forward(self, x):
        x = F.relu(self.ln1(self.fc1(x)))
        return self.fc2(x)

18. 모델 모드 전환 (Train vs Eval)

모델이 학습 중인지 추론(평가) 중인지에 따라 동작이 달라지는 레이어(Dropout, BatchNorm 등)를 제어합니다.

model.train(): 학습 모드 (Dropout 켜짐).
model.eval(): 평가 모드 (Dropout 꺼짐).

model = NeuralNetWithDropout()

# Training mode (enables dropout, batch norm updates)
model.train()

output = model(train_data)

# Evaluation mode (disables dropout, uses running stats)
model.eval()

with torch.no_grad():  # Disable gradient computation
    predictions = model(test_data)

19. GPU 사용하기

PyTorch의 강력한 장점인 GPU 가속을 사용하는 방법입니다. .to('cuda') 또는 .to(device)를 사용합니다.

# Check if CUDA (NVIDIA GPU) is available
print(torch.cuda.is_available()) 

# Get number of GPUs
print(torch.cuda.device_count()) 

# Get current GPU name
print(torch.cuda.get_device_name(0))

# Create tensor on GPU
x = torch.randn(2, 2, device='cuda')

# 방법 1

# Create tensor on CPU
x = torch.randn(3, 4)

# Move to GPU
x = x.to('cuda') 

print(x.device) #cuda:0

# 방법 2

# Use the 'device' object
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Move the tensor
x = torch.randn(3, 4, device=device)

print(x.device) #cuda:0

비슷하게, 텐서를 GPU 에서 CPU로 이동

# Move tensor from CPU to GPU
x_cpu = torch.randn(3, 4)
x_gpu = x_cpu.to('cuda')

# Move tensor from GPU to CPU
x_cpu = x_gpu.to('cpu')

# Check device
print(x_gpu.device) #cuda:0
print(x_cpu.device) #cpu

텐서처럼, 완료된 모델을 GPU로 이동

model = nn.Sequential(
    nn.Linear(10, 50),
    nn.ReLU(),
    nn.Linear(50, 1)
)

# Move entire model to GPU
model = model.to('cuda')

# Or
device = torch.device('cuda')
model = model.to(device)

GPU 에서 훈련된 모델을 트레이닝 루프에서 사용하는 방법

# Setup device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Define Model, Loss function, and Optimizer
model = MyModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(num_epochs):
    for data, target in train_loader:
        # Move data to GPU
        data = data.to(device)
        target = target.to(device)
        
        # Forward pass
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        
        # Backward pass
        loss.backward()
        optimizer.step()

20. 모델을 저장하고 불러오기

# Save entire model
torch.save(model, 'full_model.pth')

# Load entire model
model = torch.load('full_model.pth')

# Switch to eval model
model.eval()

# Saving the model

# Define model
model = nn.Sequential(
    nn.Linear(10, 50),
    nn.ReLU(),
    nn.Linear(50, 1)
)

# Save model state_dict
torch.save(model.state_dict(), 'model_weights.pth')

# Loading the model

# Define model
model = nn.Sequential(
    nn.Linear(10, 50),
    nn.ReLU(),
    nn.Linear(50, 1)
)

# Load model state_dict
model.load_state_dict(torch.load('model_weights.pth'))

# Switch to eval model
model.eval()