009 metric - 최적화 지표 설정하기

키워드: metric, 평가 지표

개요

FLAML에서 metric 파라미터는 모델이 최적화할 목표를 정의합니다. 어떤 지표를 선택하느냐에 따라 학습 결과가 크게 달라질 수 있습니다.

이 글에서는 FLAML에서 사용 가능한 다양한 평가 지표와 상황별 선택 가이드를 알아보겠습니다.

실습 환경

Python 버전: 3.11 권장
필요 패키지: flaml[automl], pandas, scikit-learn

pip install flaml[automl] pandas scikit-learn

metric 파라미터 기본 사용법

from flaml import AutoML

automl = AutoML()
automl.fit(
    X_train, y_train,
    task="classification",
    time_budget=60,
    metric="accuracy"  # 최적화할 지표
)

분류용 평가 지표

기본 지표

metric	설명	사용 상황
`"accuracy"`	정확도	균형 잡힌 데이터
`"log_loss"`	로그 손실	확률 예측 중요
`"roc_auc"`	ROC AUC	불균형 데이터, 이진 분류
`"roc_auc_ovr"`	다중 분류 ROC AUC	불균형, 다중 분류
`"f1"`	F1 점수	불균형 데이터
`"micro_f1"`	Micro F1	다중 분류
`"macro_f1"`	Macro F1	다중 분류, 클래스별 균등 중요
`"ap"`	Average Precision	불균형 데이터

예제: 각 지표로 학습

from flaml import AutoML
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, f1_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

metrics_to_test = ["accuracy", "roc_auc", "f1", "log_loss"]

for metric in metrics_to_test:
    automl = AutoML()
    automl.fit(
        X_train, y_train,
        task="classification",
        time_budget=30,
        metric=metric,
        verbose=0
    )

    y_pred = automl.predict(X_test)
    y_prob = automl.predict_proba(X_test)[:, 1]

    print(f"\n=== metric='{metric}' ===")
    print(f"최적 모델: {automl.best_estimator}")
    print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
    print(f"ROC AUC:  {roc_auc_score(y_test, y_prob):.4f}")
    print(f"F1 Score: {f1_score(y_test, y_pred):.4f}")

실행 결과 예시

=== metric='accuracy' ===
최적 모델: lgbm
Accuracy: 0.9737
ROC AUC:  0.9952
F1 Score: 0.9787

=== metric='roc_auc' ===
최적 모델: lgbm
Accuracy: 0.9649
ROC AUC:  0.9967
F1 Score: 0.9714

=== metric='f1' ===
최적 모델: lgbm
Accuracy: 0.9737
ROC AUC:  0.9945
F1 Score: 0.9787

=== metric='log_loss' ===
최적 모델: lgbm
Accuracy: 0.9649
ROC AUC:  0.9960
F1 Score: 0.9714

회귀용 평가 지표

기본 지표

metric	설명	특징
`"r2"`	R² 점수	설명력 측정 (1에 가까울수록 좋음)
`"mse"`	평균 제곱 오차	큰 오차에 민감
`"rmse"`	루트 MSE	MSE의 제곱근, 해석 용이
`"mae"`	평균 절대 오차	이상치에 강건
`"mape"`	평균 절대 백분율 오차	상대적 오차

예제: 회귀 지표 비교

from flaml import AutoML
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
import numpy as np

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

metrics_to_test = ["r2", "mse", "mae"]

for metric in metrics_to_test:
    automl = AutoML()
    automl.fit(
        X_train, y_train,
        task="regression",
        time_budget=30,
        metric=metric,
        verbose=0
    )

    y_pred = automl.predict(X_test)

    print(f"\n=== metric='{metric}' ===")
    print(f"최적 모델: {automl.best_estimator}")
    print(f"R²:   {r2_score(y_test, y_pred):.4f}")
    print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.2f}")
    print(f"MAE:  {mean_absolute_error(y_test, y_pred):.2f}")

상황별 지표 선택 가이드

분류 문제

데이터 균형 확인
    │
    ├─ 균형 데이터 (클래스 비율 비슷)
    │       │
    │       └─→ "accuracy" 사용
    │
    └─ 불균형 데이터 (한 클래스가 많음)
            │
            ├─ 확률 예측 중요 ─→ "roc_auc" 또는 "log_loss"
            │
            ├─ 양성 클래스 중요 ─→ "f1" 또는 "ap"
            │
            └─ 다중 분류 ─→ "macro_f1" 또는 "roc_auc_ovr"

회귀 문제

오차의 해석 방식
    │
    ├─ 절대값 해석 필요 ─→ "rmse" 또는 "mae"
    │
    ├─ 이상치에 강건해야 함 ─→ "mae"
    │
    ├─ 상대적 오차 중요 ─→ "mape"
    │
    └─ 설명력 측정 ─→ "r2"

커스텀 평가 지표

FLAML은 사용자 정의 평가 지표도 지원합니다.

커스텀 지표 정의

def custom_metric(
    X_val, y_val, estimator, labels,
    X_train, y_train, weight_val=None,
    weight_train=None, *args
):
    """
    커스텀 평가 지표 함수

    Returns:
        float: 손실 값 (낮을수록 좋음)
        dict: 추가 메트릭 정보
    """
    from sklearn.metrics import f1_score

    y_pred = estimator.predict(X_val)
    f1 = f1_score(y_val, y_pred, average='weighted')

    # FLAML은 손실을 최소화하므로 1 - f1 반환
    return 1 - f1, {"f1_weighted": f1}

커스텀 지표 사용

automl = AutoML()
automl.fit(
    X_train, y_train,
    task="classification",
    time_budget=60,
    metric=custom_metric  # 함수 직접 전달
)

예제: Specificity 최적화

def specificity_metric(X_val, y_val, estimator, labels,
                       X_train, y_train, weight_val=None,
                       weight_train=None, *args):
    """특이도(Specificity) 최적화"""
    from sklearn.metrics import confusion_matrix

    y_pred = estimator.predict(X_val)
    tn, fp, fn, tp = confusion_matrix(y_val, y_pred).ravel()

    specificity = tn / (tn + fp) if (tn + fp) > 0 else 0

    # 손실로 변환 (1 - specificity)
    return 1 - specificity, {"specificity": specificity}

# 009 사용
automl.fit(
    X_train, y_train,
    task="classification",
    time_budget=60,
    metric=specificity_metric
)

실전 예제: 불균형 데이터

from sklearn.datasets import make_classification
import numpy as np

# 009 불균형 데이터 생성 (95:5 비율)
X, y = make_classification(
    n_samples=10000,
    n_features=20,
    n_classes=2,
    weights=[0.95, 0.05],  # 불균형
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"클래스 분포:")
print(f"  학습 - Class 0: {sum(y_train==0)}, Class 1: {sum(y_train==1)}")
print(f"  테스트 - Class 0: {sum(y_test==0)}, Class 1: {sum(y_test==1)}")

# 009 accuracy로 학습
automl_acc = AutoML()
automl_acc.fit(X_train, y_train, task="classification",
               time_budget=30, metric="accuracy", verbose=0)

# 009 roc_auc로 학습
automl_auc = AutoML()
automl_auc.fit(X_train, y_train, task="classification",
               time_budget=30, metric="roc_auc", verbose=0)

# 009 비교
from sklearn.metrics import classification_report

print("\n=== accuracy 최적화 ===")
print(classification_report(y_test, automl_acc.predict(X_test)))

print("\n=== roc_auc 최적화 ===")
print(classification_report(y_test, automl_auc.predict(X_test)))

실행 결과 분석

불균형 데이터에서:

accuracy는 다수 클래스에 편향될 수 있음
roc_auc는 소수 클래스도 적절히 고려

정리

metric은 FLAML이 최적화할 목표를 정의합니다.
분류: accuracy, roc_auc, f1, log_loss 등
회귀: r2, mse, rmse, mae, mape 등
불균형 데이터에서는 accuracy 대신 roc_auc나 f1을 권장합니다.
커스텀 지표도 함수로 정의해서 사용할 수 있습니다.
문제의 특성에 맞는 지표를 선택하는 것이 중요합니다.

다음 글 예고

다음 글에서는 estimator_list - 탐색할 모델 지정하기에 대해 알아보겠습니다. FLAML이 탐색할 모델을 제한하거나 확장하는 방법을 다룹니다.

FLAML AutoML 마스터 시리즈 #009

개요​

실습 환경​

metric 파라미터 기본 사용법​

분류용 평가 지표​

기본 지표​

예제: 각 지표로 학습​

실행 결과 예시​

회귀용 평가 지표​

기본 지표​

예제: 회귀 지표 비교​

상황별 지표 선택 가이드​

분류 문제​

회귀 문제​

커스텀 평가 지표​

커스텀 지표 정의​

커스텀 지표 사용​

예제: Specificity 최적화​

실전 예제: 불균형 데이터​

실행 결과 분석​

정리​

다음 글 예고​

개요

실습 환경

metric 파라미터 기본 사용법

분류용 평가 지표

기본 지표

예제: 각 지표로 학습

실행 결과 예시

회귀용 평가 지표

기본 지표

예제: 회귀 지표 비교

상황별 지표 선택 가이드

분류 문제

회귀 문제

커스텀 평가 지표

커스텀 지표 정의

커스텀 지표 사용

예제: Specificity 최적화

실전 예제: 불균형 데이터

실행 결과 분석

정리

다음 글 예고