098 FLAML 트러블슈팅 가이드

키워드: 트러블슈팅, 오류 해결, 디버깅, FAQ

개요

FLAML 사용 중 발생할 수 있는 일반적인 오류와 해결 방법을 정리합니다. 이 가이드로 대부분의 문제를 빠르게 해결할 수 있습니다.

설치 관련 문제

설치 실패

# 098 문제: pip install flaml 실패
errors_install = """
=== 설치 오류 해결 ===

1. 기본 설치 실패
   오류: "Could not build wheels for..."

   해결:
   ```bash
   # 빌드 도구 업데이트
   pip install --upgrade pip setuptools wheel

   # 재설치
   pip install flaml

AutoML 기능 설치 실패 오류: "No module named 'lightgbm'"

해결:

# AutoML 옵션으로 설치
pip install flaml[automl]

# 또는 개별 설치
pip install lightgbm xgboost catboost

LightGBM 설치 실패 (Mac M1/M2) 오류: "libomp.dylib" 관련 오류

해결:

# Homebrew로 libomp 설치
brew install libomp

# 환경 변수 설정
export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"

pip install lightgbm

Windows에서 XGBoost 오류 오류: "DLL load failed"

해결:

# Visual C++ 재배포 패키지 설치
# https://aka.ms/vs/17/release/vc_redist.x64.exe

pip install xgboost

"""

print(errors_install)

### 버전 충돌

```python
version_conflicts = """
=== 버전 충돌 해결 ===

문제: scikit-learn 버전 충돌
오류: "scikit-learn version mismatch"

해결:
```bash
# 098 호환 버전 확인
pip show flaml

# 098 권장 환경
pip install flaml[automl]==2.1.0
pip install scikit-learn==1.3.0

문제: numpy 버전 충돌 오류: "numpy.core.multiarray failed to import"

해결:

pip install numpy==1.24.0
pip install flaml[automl] --no-deps
pip install lightgbm xgboost catboost

팁: 가상환경 사용 권장

python -m venv flaml_env
source flaml_env/bin/activate  # Linux/Mac
flaml_env\\Scripts\\activate     # Windows
pip install flaml[automl]

"""

print(version_conflicts)

## 학습 관련 문제

### 메모리 오류

```python
import numpy as np
import pandas as pd

memory_errors = """
=== 메모리 오류 해결 ===

문제: MemoryError 또는 프로세스 중단
오류: "MemoryError: Unable to allocate..."

해결 방법:
"""

print(memory_errors)

# 098 해결 1: 데이터 샘플링
def handle_large_data(X, y, max_samples=100000):
    """대용량 데이터 처리"""
    if len(X) > max_samples:
        print(f"데이터 샘플링: {len(X)} → {max_samples}")
        indices = np.random.choice(len(X), max_samples, replace=False)
        return X[indices], y[indices]
    return X, y

# 098 해결 2: 데이터 타입 최적화
def optimize_dtypes(df):
    """메모리 최적화"""
    for col in df.select_dtypes(include=['float64']).columns:
        df[col] = df[col].astype('float32')

    for col in df.select_dtypes(include=['int64']).columns:
        if df[col].min() >= 0 and df[col].max() < 255:
            df[col] = df[col].astype('uint8')
        elif df[col].min() >= -128 and df[col].max() < 127:
            df[col] = df[col].astype('int8')

    return df

# 098 해결 3: 경량 모델만 사용
print("""
# 098 경량 모델만 사용
automl.fit(
    X, y,
    estimator_list=['lgbm', 'rf'],  # 경량 모델만
    n_jobs=1,  # 병렬 처리 비활성화
    free_mem_ratio=0.3  # 메모리 여유 확보
)
""")

학습이 너무 오래 걸림

slow_training = """
=== 느린 학습 해결 ===

문제: 학습 시간이 예상보다 오래 걸림

해결 1: time_budget 조정
```python
automl.fit(
    X, y,
    time_budget=60,  # 초 단위로 제한
    early_stop=True  # 조기 종료 활성화
)

해결 2: 샘플 크기로 가중치 탐색

automl.fit(
    X, y,
    time_budget=120,
    starting_points="data",  # 데이터 기반 시작점
    n_splits=3  # CV 폴드 수 감소
)

해결 3: 모델 수 제한

automl.fit(
    X, y,
    estimator_list=['lgbm'],  # 단일 모델만
    max_iter=100  # 최대 반복 제한
)

해결 4: 병렬 처리

automl.fit(
    X, y,
    n_jobs=-1,  # 모든 CPU 사용
    n_concurrent_trials=4  # 동시 시도
)

"""

print(slow_training)

### 성능이 기대 이하

```python
poor_performance = """
=== 낮은 성능 해결 ===

문제: 모델 성능이 기대보다 낮음

체크리스트:
1. 데이터 품질 확인
2. 시간 예산 충분한지 확인
3. 적절한 메트릭 사용 여부

해결 방법:
"""

print(poor_performance)

from flaml import AutoML
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# 098 샘플 데이터
np.random.seed(42)
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 098 해결 1: 더 많은 시간 예산
print("1. 시간 예산 늘리기:")
automl = AutoML()
automl.fit(X_train, y_train, task="classification",
           time_budget=300,  # 5분으로 증가
           verbose=0)
print(f"   결과: {automl.best_estimator}")

# 098 해결 2: 더 많은 모델 탐색
print("\n2. 모델 범위 확장:")
print("""
automl.fit(
    X, y,
    estimator_list=['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree'],
    time_budget=300
)
""")

# 098 해결 3: 하이퍼파라미터 범위 확장
print("3. 하이퍼파라미터 범위 확장:")
print("""
custom_hp = {
    'n_estimators': {'domain': tune.randint(100, 2000)},
    'max_depth': {'domain': tune.randint(3, 15)},
    'learning_rate': {'domain': tune.loguniform(0.001, 0.3)}
}

automl.fit(X, y, custom_hp=custom_hp, time_budget=300)
""")

예측 관련 문제

예측 오류

prediction_errors = """
=== 예측 오류 해결 ===

문제 1: "특성 수 불일치"
오류: "X has N features, but model expects M features"

해결:
```python
# 098 학습 시 특성 이름 저장
feature_names = X_train.columns.tolist()

# 098 예측 시 동일 특성 사용
X_test = X_test[feature_names]
predictions = automl.predict(X_test)

문제 2: "예측 시 NaN 발생" 오류: predictions에 NaN 포함

해결:

# 098 입력 데이터 결측치 확인
print(f"결측치: {np.isnan(X_test).sum()}")

# 098 결측치 처리
X_test = np.nan_to_num(X_test, nan=0)
predictions = automl.predict(X_test)

문제 3: "predict_proba 오류" 오류: "predict_proba not available"

해결:

# 098 확률 예측 지원 확인
if hasattr(automl, 'predict_proba'):
    proba = automl.predict_proba(X_test)
else:
    # 대안: 단순 예측 사용
    predictions = automl.predict(X_test)

"""

print(prediction_errors)

### 모델 저장/로드 오류

```python
save_load_errors = """
=== 저장/로드 오류 해결 ===

문제 1: pickle 오류
오류: "Can't pickle..."

해결:
```python
import joblib

# 098 저장
joblib.dump(automl, 'model.pkl', compress=3)

# 098 로드
automl = joblib.load('model.pkl')

문제 2: 버전 불일치 오류: "module 'flaml' has no attribute..."

해결:

# 098 저장 시 버전 기록
import flaml
metadata = {
    'flaml_version': flaml.__version__,
    'sklearn_version': sklearn.__version__
}

# 098 동일 버전에서 로드
# 098 pip install flaml==X.X.X

문제 3: 대용량 모델 저장 오류: 파일 크기가 너무 큼

해결:

# 098 압축 저장
import pickle
import gzip

with gzip.open('model.pkl.gz', 'wb') as f:
    pickle.dump(automl, f)

# 098 압축 로드
with gzip.open('model.pkl.gz', 'rb') as f:
    automl = pickle.load(f)

"""

print(save_load_errors)

## 특정 상황 문제

### 클래스 불균형

```python
imbalance_issues = """
=== 클래스 불균형 문제 ===

문제: 소수 클래스 예측 실패

해결 1: 적절한 메트릭 사용
```python
automl.fit(
    X, y,
    metric='f1',  # 또는 'roc_auc', 'log_loss'
    # 'accuracy'는 불균형에 부적절
)

해결 2: 샘플 가중치

from sklearn.utils.class_weight import compute_sample_weight

sample_weight = compute_sample_weight('balanced', y_train)
automl.fit(
    X_train, y_train,
    sample_weight=sample_weight
)

해결 3: 오버샘플링/언더샘플링

from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

automl.fit(X_resampled, y_resampled)

"""

print(imbalance_issues)

### 시계열 문제

```python
timeseries_issues = """
=== 시계열 관련 문제 ===

문제 1: 계절성 미반영
해결:
```python
automl.fit(
    X, y,
    task='ts_forecast',
    period=12  # 계절성 주기 명시
)

문제 2: 예측 범위 오류 오류: "forecast horizon mismatch"

해결:

# 098 예측 기간과 학습 설정 일치
automl.fit(
    X, y,
    task='ts_forecast',
    period=7,  # 주기
    # 예측도 동일 기간으로
)

predictions = automl.predict(X_future)

문제 3: 날짜 인덱스 오류 해결:

# 098 날짜 인덱스 설정
df.index = pd.to_datetime(df['date'])
df = df.asfreq('D')  # 일간 빈도

"""

print(timeseries_issues)

## 디버깅 팁

```python
debugging_tips = """
=== 디버깅 팁 ===

1. verbose 활성화
```python
automl.fit(X, y, verbose=2)  # 상세 로그

설정 이력 확인

print(automl.config_history)
print(automl.best_config)
print(automl.best_loss)

모델 구조 확인

print(f"모델 타입: {type(automl.model)}")
print(f"베스트 추정기: {automl.best_estimator}")

단계별 실행

# 1단계: 데이터 확인
print(f"Shape: {X.shape}, NaN: {np.isnan(X).sum()}")

# 2단계: 단일 모델 테스트
automl.fit(X, y, estimator_list=['lgbm'], time_budget=10)

# 3단계: 점진적 확장
automl.fit(X, y, time_budget=30)

로그 파일 저장

import logging
logging.basicConfig(filename='flaml.log', level=logging.DEBUG)

automl.fit(X, y, verbose=2)

"""

print(debugging_tips)

## FAQ

```python
faq = """
=== 자주 묻는 질문 ===

Q1: FLAML이 항상 같은 모델을 선택합니다.
A1: seed 설정과 데이터 특성 확인. 다른 estimator_list 시도.

Q2: GPU를 사용하려면?
A2: LightGBM, XGBoost 설치 시 GPU 버전 사용.
    pip install lightgbm --install-option=--gpu

Q3: 커스텀 모델을 추가하려면?
A3: custom_hp와 estimator_list 사용.

Q4: 재현 가능한 결과를 얻으려면?
A4: seed 파라미터 설정 (automl.fit(..., seed=42))

Q5: 멀티 아웃풋을 지원하나요?
A5: 기본 지원 안 함. MultiOutputClassifier 래퍼 사용.

Q6: 범주형 특성 처리는?
A6: LightGBM/CatBoost는 자동 처리. 그 외는 인코딩 필요.
"""

print(faq)

정리

문제 유형	일반적 원인	빠른 해결책
설치 실패	의존성 충돌	가상환경 사용
메모리 오류	데이터 크기	샘플링, dtype 최적화
느린 학습	시간/모델 설정	time_budget 조정
낮은 성능	부족한 탐색	시간 증가, 모델 확장
예측 오류	특성 불일치	특성 순서 확인

다음 글 예고

다음 글에서는 FLAML 생태계와 커뮤니티를 알아봅니다. FLAML을 둘러싼 도구들과 커뮤니티 리소스를 소개합니다.

FLAML AutoML 마스터 시리즈 #098

개요​

설치 관련 문제​

설치 실패​

학습이 너무 오래 걸림​

예측 관련 문제​

예측 오류​

정리​

다음 글 예고​

개요