1、使用SMOTE进行过采样
使用方法:
X_train, X_val, y_train, y_val = train_test_split(train_df[predictors], train_df[target], test_size=0.15, random_state=1234)
from imblearn.over_sampling import SMOTE oversampler = SMOTE(ratio='auto', random_state=np.random.randint(100), k_neighbors=5, m_neighbors=10, kind='regular', n_jobs=-1) os_X_train, os_y_train = oversampler.fit_sample(X_train,y_train)
from collections import Counter print('Resampled dataset shape {}'.format(Counter(os_y_train)))
注意,过采样之后就不能直接把Pandas.DataFrame数据传入模型,特征名称已改变
model=XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=-1, scale_pos_weight=1, seed=27 ) model.fit( os_X_train, os_y_train, eval_set=[(X_val.values, y_val)], early_stopping_rounds=3, verbose=True, eval_metric='auc' )
2、欠采样,也叫下采样
def down_sample(df): """ 欠采样 """ df1 = df[df['acc_now_delinq'] == 1] df2 = df[df['acc_now_delinq'] == 0] df3 = df2.sample(frac=0.1) return pd.concat([df1, df3], ignore_index=True)
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。
发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/220194.html原文链接:https://javaforall.net
