013 학습 결과 확인하기 - best_estimator

키워드: best_estimator, 결과

개요

FLAML 학습이 완료된 후, 어떤 모델이 선택되었고 어떤 설정으로 학습되었는지 확인하는 것이 중요합니다. 이 글에서는 학습 결과를 확인하는 다양한 속성과 메서드를 알아봅니다.

실습 환경

Python 버전: 3.11 권장
필요 패키지: flaml[automl], pandas, scikit-learn

pip install flaml[automl] pandas scikit-learn

예제 데이터 준비

from flaml import AutoML
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# 013 데이터 로드 및 분할
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 013 FLAML 학습
automl = AutoML()
automl.fit(
    X_train, y_train,
    task="classification",
    time_budget=60,
    seed=42,
    verbose=0
)

print("학습 완료!")

주요 결과 속성

1. best_estimator

최적의 모델 이름(문자열)입니다.

print(f"최적 모델: {automl.best_estimator}")

실행 결과

최적 모델: lgbm

2. best_model

실제 학습된 모델 객체입니다.

# 013 모델 객체 확인
model = automl.best_model
print(f"모델 타입: {type(model)}")
print(f"모델 객체: {model}")

실행 결과

모델 타입: <class 'lightgbm.sklearn.LGBMClassifier'>
모델 객체: LGBMClassifier(colsample_bytree=0.986, learning_rate=0.267, ...)

3. best_config

최적 하이퍼파라미터 딕셔너리입니다.

print("최적 하이퍼파라미터:")
for key, value in automl.best_config.items():
    print(f"  {key}: {value}")

실행 결과

최적 하이퍼파라미터:
  n_estimators: 63
  num_leaves: 4
  min_child_samples: 7
  learning_rate: 0.2677
  log_max_bin: 8
  colsample_bytree: 0.9867
  reg_alpha: 0.0009
  reg_lambda: 0.0087

4. best_loss

최적 모델의 검증 손실(낮을수록 좋음)입니다.

# 013 손실값
print(f"최적 손실: {automl.best_loss:.4f}")

# 013 정확도로 변환 (분류의 경우)
print(f"검증 정확도: {1 - automl.best_loss:.4f}")

실행 결과

최적 손실: 0.0330
검증 정확도: 0.9670

5. best_config_train_time

최적 모델 학습에 소요된 시간(초)입니다.

print(f"최적 모델 학습 시간: {automl.best_config_train_time:.2f}초")

추가 결과 속성

6. classes_

분류 문제에서 클래스 레이블입니다.

print(f"클래스: {automl.classes_}")

실행 결과

클래스: [0 1 2]

7. feature_names_in_

학습에 사용된 특성 이름입니다.

# 013 DataFrame으로 학습한 경우
# 013 print(f"특성 이름: {automl.feature_names_in_}")

8. n_features_in_

학습에 사용된 특성 수입니다.

print(f"특성 수: {automl.n_features_in_}")

예측 메서드

predict()

클래스 레이블(분류) 또는 값(회귀) 예측입니다.

# 013 예측
y_pred = automl.predict(X_test)
print(f"예측 결과 (처음 10개): {y_pred[:10]}")

predict_proba()

각 클래스에 속할 확률입니다 (분류만 해당).

# 013 확률 예측
y_prob = automl.predict_proba(X_test)
print(f"확률 예측 shape: {y_prob.shape}")
print(f"첫 번째 샘플 확률: {y_prob[0]}")

실행 결과

확률 예측 shape: (30, 3)
첫 번째 샘플 확률: [0.001 0.012 0.987]

학습 히스토리 확인

9. best_config_per_estimator

각 추정기별 최적 설정입니다.

print("추정기별 최적 설정:")
for estimator, config in automl.best_config_per_estimator.items():
    if config:
        print(f"\n{estimator}:")
        for key, value in config.items():
            print(f"  {key}: {value}")

10. best_loss_per_estimator

각 추정기별 최적 손실입니다.

print("추정기별 최적 손실:")
for estimator, loss in automl.best_loss_per_estimator.items():
    if loss < float('inf'):
        print(f"  {estimator}: {loss:.4f} (정확도: {1-loss:.4f})")

실행 결과

추정기별 최적 손실:
  lgbm: 0.0330 (정확도: 0.9670)
  xgboost: 0.0418 (정확도: 0.9582)
  rf: 0.0440 (정확도: 0.9560)
  extra_tree: 0.0462 (정확도: 0.9538)

결과 요약 출력 함수

def print_automl_summary(automl, X_test=None, y_test=None):
    """FLAML 학습 결과 요약 출력"""
    print("="*60)
    print("FLAML 학습 결과 요약")
    print("="*60)

    # 기본 정보
    print(f"\n[기본 정보]")
    print(f"  최적 모델: {automl.best_estimator}")
    print(f"  검증 점수: {1 - automl.best_loss:.4f}")
    print(f"  학습 시간: {automl.best_config_train_time:.2f}초")

    # 하이퍼파라미터
    print(f"\n[최적 하이퍼파라미터]")
    for key, value in automl.best_config.items():
        if isinstance(value, float):
            print(f"  {key}: {value:.4f}")
        else:
            print(f"  {key}: {value}")

    # 추정기별 비교
    print(f"\n[추정기별 성능]")
    for estimator, loss in automl.best_loss_per_estimator.items():
        if loss < float('inf'):
            print(f"  {estimator:15s}: {1-loss:.4f}")

    # 테스트 평가 (선택)
    if X_test is not None and y_test is not None:
        from sklearn.metrics import accuracy_score
        y_pred = automl.predict(X_test)
        test_acc = accuracy_score(y_test, y_pred)
        print(f"\n[테스트 성능]")
        print(f"  테스트 정확도: {test_acc:.4f}")

    print("="*60)

# 013 사용
print_automl_summary(automl, X_test, y_test)

실행 결과

============================================================
FLAML 학습 결과 요약
============================================================

[기본 정보]
  최적 모델: lgbm
  검증 점수: 0.9670
  학습 시간: 0.12초

[최적 하이퍼파라미터]
  n_estimators: 63
  num_leaves: 4
  min_child_samples: 7
  learning_rate: 0.2677
  ...

[추정기별 성능]
  lgbm           : 0.9670
  xgboost        : 0.9582
  rf             : 0.9560
  extra_tree     : 0.9538

[테스트 성능]
  테스트 정확도: 1.0000
============================================================

모델 내부 정보 확인

LightGBM 특성 중요도

import matplotlib.pyplot as plt

# 013 best_model에서 특성 중요도 추출
if automl.best_estimator == 'lgbm':
    model = automl.best_model
    importance = model.feature_importances_

    # 시각화
    plt.figure(figsize=(10, 6))
    plt.barh(range(len(importance)), importance)
    plt.xlabel('Feature Importance')
    plt.ylabel('Feature Index')
    plt.title('LightGBM Feature Importance')
    plt.tight_layout()
    plt.show()

결과 속성 요약표

속성	타입	설명
`best_estimator`	str	최적 모델 이름
`best_model`	object	학습된 모델 객체
`best_config`	dict	최적 하이퍼파라미터
`best_loss`	float	최적 검증 손실
`best_config_train_time`	float	학습 소요 시간
`classes_`	array	클래스 레이블 (분류)
`n_features_in_`	int	특성 수
`best_config_per_estimator`	dict	추정기별 최적 설정
`best_loss_per_estimator`	dict	추정기별 최적 손실

정리

best_estimator로 최적 모델 이름을 확인합니다.
best_model로 실제 모델 객체에 접근합니다.
best_config로 최적 하이퍼파라미터를 확인합니다.
best_loss로 검증 손실을 확인합니다 (1 - loss = 점수).
best_loss_per_estimator로 추정기별 성능을 비교합니다.
predict()와 predict_proba()로 예측을 수행합니다.

다음 글 예고

다음 글에서는 학습 로그 분석하기에 대해 알아보겠습니다. FLAML의 학습 과정을 로그로 기록하고 분석하는 방법을 다룹니다.

FLAML AutoML 마스터 시리즈 #013

개요​

실습 환경​

예제 데이터 준비​

주요 결과 속성​

1. best_estimator​

실행 결과​

2. best_model​

실행 결과​

3. best_config​

실행 결과​

4. best_loss​

실행 결과​

5. best_config_train_time​

추가 결과 속성​

6. classes_​

실행 결과​

7. feature_names_in_​

8. n_features_in_​

예측 메서드​

predict()​

predict_proba()​

실행 결과​

학습 히스토리 확인​

9. best_config_per_estimator​

10. best_loss_per_estimator​

실행 결과​

결과 요약 출력 함수​

실행 결과​

모델 내부 정보 확인​

LightGBM 특성 중요도​

결과 속성 요약표​

정리​

다음 글 예고​

개요

실습 환경

예제 데이터 준비

주요 결과 속성

1. best_estimator

실행 결과

2. best_model

실행 결과

3. best_config

실행 결과

4. best_loss

실행 결과

5. best_config_train_time

추가 결과 속성

6. classes_

실행 결과

7. feature_names_in_

8. n_features_in_

예측 메서드

predict()

predict_proba()

실행 결과

학습 히스토리 확인

9. best_config_per_estimator

10. best_loss_per_estimator

실행 결과

결과 요약 출력 함수

실행 결과

모델 내부 정보 확인

LightGBM 특성 중요도

결과 속성 요약표

정리

다음 글 예고