sklearn KFold()

最近实践过程中遇到需要KFold()记录一下,以便日后查阅KFold()在sklearn中属于model_slection模块fromsklearn.model_selectionimportKFoldKFold(n_splits=’warn’,shuffle=False,random_state=None)参数:n_splits表示划分为几块(至少是2)shuffle…

大家好,又见面了,我是你们的朋友全栈君。

最近实践过程中遇到需要KFold()
记录一下,以便日后查阅
KFold()在sklearn中属于model_slection模块

from sklearn.model_selection import KFold

KFold(n_splits=’warn’, shuffle=False, random_state=None)
参数:
n_splits 表示划分为几块(至少是2)

shuffle 表示是否打乱划分,默认False,即不打乱

random_state 表示是否固定随机起点,Used when shuffle == True.

方法
1,get_n_splits([X, y, groups]) 返回分的块数
2,split(X[,Y,groups]) 返回分类后数据集的index

例子:
1, get_n_splits()

from sklearn.model_selection import KFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = KFold(n_splits=2)
print( kf.get_n_splits(X))

输出2

2, split()

for train_index, test_index in kf.split(X):
	print("TRAIN:", train_index, "TEST:", test_index)
	X_train, X_test = X.iloc[train_index], X.iloc[test_index]
	y_train, y_test = y.iloc[train_index], y.iloc[test_index]

TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]

for k,(train,test) in enumerate(kf.split(x,y)):
	print (k,(train,test))
	x_train=X.iloc[train]
	x_test=X.iloc[test]
	y_train=Y.iloc[train]
	y_test=Y.iloc[tes]

用KFold()优化逻辑回归参数C
参数C为正则化项的系数gama的倒数(C=1/gama)

def best_C_param (x,y):
    kf=KFold(n_splits=7,shuffle=True,random_state=0)
    C_param=[0.001,0.01,0.1,1,10,100]
    result=[]
    for c_parm in C_param:
        print('C_param',c_parm)
        recall_score_lr_kf=[]
        for k,(train,test) in enumerate(kf.split(x,y)):
            #print(y.iloc[train])
            model = LogisticRegression(C=c_parm,penalty='l2')
            lr_kf=model.fit(x.iloc[train],y.iloc[train].values.ravel())
            pred_lr_kf=lr_kf.predict(x.iloc[test])
            recall_score_lrkf=recall_score(y.iloc[test],pred_lr_kf)
            recall_score_lr_kf.append(recall_score_lrkf)
            print('iteration',k,'recall score',recall_score_lrkf)
        result.append(np.mean(recall_score_lr_kf))
        print(c_parm,np.mean(recall_score_lr_kf))
        print('bets mean recall score',max(result))


print('using whole datasets X and Y')
best_C_param(X,Y)

print('-------------------------')
print('using under sampling data X and Y')
best_C_param(X_underSampling,Y_underSampling)

一点小小的感触:
数据是非平衡数据结构,正样本1在总体数据集中只占有0.17%
欠采样处理后,二分类比例达到1:1
欠采样处理后的数据KFold寻找LR的最佳C:

方法1:不打乱划分,即shuffle=False (默认),其他同上

kf=KFold(n_splits=7)

每个C参数,出现recall score为0 (7块中至少2块为0),导致平均下来每个c_parm的recall均分只有0.5左右。
原因:不打乱的时候,分块中有些没分到正样本

方法2:打乱划分,固定随机种子

kf=KFold(n_splits=7,shuffle=True,random_state=0)

输出:结果对欠采样处理后的数据表现较好

using whole datasets X and Y before under sampling

C_param 0.001

iteration 0 recall score 0.5797101449275363

iteration 1 recall score 0.5223880597014925

iteration 2 recall score 0.5

iteration 3 recall score 0.5571428571428572

iteration 4 recall score 0.6774193548387096

iteration 5 recall score 0.5692307692307692

iteration 6 recall score 0.4567901234567901

0.001 0.5518116156140221

bets mean recall score 0.5518116156140221

C_param 0.01

iteration 0 recall score 0.5942028985507246

iteration 1 recall score 0.5671641791044776

iteration 2 recall score 0.5512820512820513

iteration 3 recall score 0.6142857142857143

iteration 4 recall score 0.6774193548387096

iteration 5 recall score 0.6

iteration 6 recall score 0.5185185185185185

0.01 0.5889818166543136

bets mean recall score 0.5889818166543136

C_param 0.1

iteration 0 recall score 0.6231884057971014

iteration 1 recall score 0.6119402985074627

iteration 2 recall score 0.5512820512820513

iteration 3 recall score 0.6142857142857143

iteration 4 recall score 0.6935483870967742

iteration 5 recall score 0.6153846153846154

iteration 6 recall score 0.5555555555555556

0.1 0.6093121468441821

bets mean recall score 0.6093121468441821

C_param 1

iteration 0 recall score 0.6521739130434783

iteration 1 recall score 0.6119402985074627

iteration 2 recall score 0.5512820512820513

iteration 3 recall score 0.6428571428571429

iteration 4 recall score 0.7096774193548387

iteration 5 recall score 0.6153846153846154

iteration 6 recall score 0.5802469135802469

1 0.6233660505728338

bets mean recall score 0.6233660505728338

C_param 10

iteration 0 recall score 0.6521739130434783

iteration 1 recall score 0.6119402985074627

iteration 2 recall score 0.5512820512820513

iteration 3 recall score 0.6428571428571429

iteration 4 recall score 0.7096774193548387

iteration 5 recall score 0.6153846153846154

iteration 6 recall score 0.5802469135802469

10 0.6233660505728338

bets mean recall score 0.6233660505728338

C_param 100

iteration 0 recall score 0.6521739130434783

iteration 1 recall score 0.6119402985074627

iteration 2 recall score 0.5512820512820513

iteration 3 recall score 0.6428571428571429

iteration 4 recall score 0.7096774193548387

iteration 5 recall score 0.6153846153846154

iteration 6 recall score 0.5802469135802469

100 0.6233660505728338

bets mean recall score 0.6233660505728338

 

-------------------------

using under sampling data X and Y

C_param 0.001

iteration 0 recall score 0.9558823529411765

iteration 1 recall score 0.9861111111111112

iteration 2 recall score 0.9473684210526315

iteration 3 recall score 0.9466666666666667

iteration 4 recall score 0.9661016949152542

iteration 5 recall score 0.96

iteration 6 recall score 0.9701492537313433

0.001 0.9617542143454548

bets mean recall score 0.9617542143454548

C_param 0.01

iteration 0 recall score 0.9558823529411765

iteration 1 recall score 0.8888888888888888

iteration 2 recall score 0.881578947368421

iteration 3 recall score 0.8666666666666667

iteration 4 recall score 0.9661016949152542

iteration 5 recall score 0.9466666666666667

iteration 6 recall score 0.9253731343283582

0.01 0.9187369073964903

bets mean recall score 0.9617542143454548

C_param 0.1

iteration 0 recall score 0.9264705882352942

iteration 1 recall score 0.8888888888888888

iteration 2 recall score 0.868421052631579

iteration 3 recall score 0.8666666666666667

iteration 4 recall score 0.9661016949152542

iteration 5 recall score 0.9466666666666667

iteration 6 recall score 0.8955223880597015

0.1 0.9083911351520072

bets mean recall score 0.9617542143454548

C_param 1

iteration 0 recall score 0.9264705882352942

iteration 1 recall score 0.9027777777777778

iteration 2 recall score 0.881578947368421

iteration 3 recall score 0.8933333333333333

iteration 4 recall score 0.9491525423728814

iteration 5 recall score 0.9466666666666667

iteration 6 recall score 0.8955223880597015

1 0.9136431776877252

bets mean recall score 0.9617542143454548

C_param 10

iteration 0 recall score 0.9264705882352942

iteration 1 recall score 0.8888888888888888

iteration 2 recall score 0.8947368421052632

iteration 3 recall score 0.9066666666666666

iteration 4 recall score 0.9491525423728814

iteration 5 recall score 0.9466666666666667

iteration 6 recall score 0.8955223880597015

10 0.9154435118564803

bets mean recall score 0.9617542143454548

C_param 100

iteration 0 recall score 0.9264705882352942

iteration 1 recall score 0.9027777777777778

iteration 2 recall score 0.881578947368421

iteration 3 recall score 0.9066666666666666

iteration 4 recall score 0.9661016949152542

iteration 5 recall score 0.96

iteration 6 recall score 0.8805970149253731

100 0.9177418128412552

bets mean recall score 0.9617542143454548

 

Process finished with exit code 0


版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/125349.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 4096!——化简的2048游戏[通俗易懂]

    4096!——化简的2048游戏

    2022年2月1日
    129
  • CSDN Chrome插件来了。助开发者提升开发效率,远离996

    插件定位帮助开发者提升开发效率,远离996特点以搜索框为入口,集成开发者常用工具,提升开发效率主要功能如下:支持本地书签、tab页、历史记录搜索集成CSDN搜索结果,本地内容和远程结果无缝集成所有操作都支持快捷键,提升搜索效率他是一个时间转换工具他是一个计算器他是。。。,更多功能正在添加中安装下载安装包浏览器输入地址“chrome://extensions/”进入扩展程序页面,开启开发者模式以下操作任选其一:zip文件安装:点击“加载已解压的扩展程序”按钮,选择已解压

    2022年4月8日
    57
  • python读取txt文件内容(python怎么读取excel)

    python读取txt文件的方法:首先打开文件,代码为【f=open(‘/tmp/test.txt’)】;然后进行读取,代码为【本教程操作环境:windows7系统、python3.9版,该方法适用于所有品牌电脑。python读取txt文件的方法:一、文件的打开和创建>>>f=open(‘/tmp/test.txt’)>>>f.read()’hell…

    2022年4月14日
    50
  • CubieBoard 简单入门

    CubieBoard 简单入门大约一个月之前折腾的部分记录,当时没有完全完成,就着手其他事情了,这是存在LiveWriter中的草稿,先发出来吧,后来花了一段时间移植Qt,一直遇到了点问题,并没有完全跑通,后续估计也没有时间再继

    2022年7月4日
    20
  • 【Python】解决使用 plt.savefig 保存图片时一片空白

    【Python】解决使用 plt.savefig 保存图片时一片空白问题当使用如下代码保存使用plt.savefig保存生成的图片时,结果打开生成的图片确实一片空白。importmatplotlib.pyplotasplt”””一些画图代码”””plt.show()plt.savefig(“filename.png”)原因其实产生这个现象的原因很简单:在plt.show()后调用了plt.savefig(),在plt.show()后实际上已经创建

    2022年5月27日
    35
  • SVM-SVM概述

    SVM-SVM概述

    2021年12月17日
    39

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号