Python说明名誉卡反诓骗！骗我措施员，不存在的

发布时间：2019-10-13 08:02:53 所属栏目：教程来源：一枚程序媛呀

导读：媒介：本文研究的是大数据量(284807条数据)下模子选择的题目，也参考了一些文献，但大多不足清楚，因此吐血清算本文，但愿对各人有辅佐; 本文试着从数据说明师的角度，假想拿到数据该怎样探求纪律、选哪种模子来构建反诓骗模子?的角度来说明，以营业导向

6.2 随机丛林模子

from sklearn.ensemble import RandomForestClassifier 
rfmodel=RandomForestClassifier() 
rfmodel.fit(x_train,y_train) 
#查察模子 
print('rfmodel') 
rfmodel 
rfmodel 
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', 
 max_depth=None, max_features='auto', max_leaf_nodes=None, 
 min_impurity_decrease=0.0, min_impurity_split=None, 
 min_samples_leaf=1, min_samples_split=2, 
 min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, 
 oob_score=False, random_state=None, verbose=0, 
 warm_start=False) 
#查察夹杂矩阵 
ypred_rf=rfmodel.predict(x_test) 
print('confusion_matrix') 
print(metrics.confusion_matrix(y_test,ypred_rf)) 
confusion_matrix 
[[85291 4] 
 [ 34 114]] 
#查察分类陈诉 
print('classification_report') 
print(metrics.classification_report(y_test,ypred_rf)) 
classification_report 
 precision recall f1-score support 
 0 1.00 1.00 1.00 85295 
 1 0.97 0.77 0.86 148 
avg / total 1.00 1.00 1.00 85443 
#查察猜测精度与决定包围面 
print('Accuracy:%f'%(metrics.accuracy_score(y_test,ypred_rf))) 
print('Area under the curve:%f'%(metrics.roc_auc_score(y_test,ypred_rf))) 
Accuracy:0.999625 
Area under the curve:0.902009

6.3支持向量机SVM

# SVM分类 
from sklearn.svm import SVC 
svcmodel=SVC(kernel='sigmoid') 
svcmodel.fit(x_train,y_train) 
#查察模子 
print('svcmodel') 
svcmodel 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, 
 decision_function_shape='ovr', degree=3, gamma='auto', kernel='sigmoid', 
 max_iter=-1, probability=False, random_state=None, shrinking=True, 
 tol=0.001, verbose=False) 
#查察夹杂矩阵 
ypred_svc=svcmodel.predict(x_test) 
print('confusion_matrix') 
print(metrics.confusion_matrix(y_test,ypred_svc)) 
confusion_matrix 
[[85197 98] 
 [ 142 6]] 
#查察分类陈诉 
print('classification_report') 
print(metrics.classification_report(y_test,ypred_svc)) 
classification_report 
 precision recall f1-score support 
 0 1.00 1.00 1.00 85295 
 1 0.06 0.04 0.05 148 
avg / total 1.00 1.00 1.00 85443 
#查察猜测精度与决定包围面 
print('Accuracy:%f'%(metrics.accuracy_score(y_test,ypred_svc))) 
print('Area under the curve:%f'%(metrics.roc_auc_score(y_test,ypred_svc))) 
Accuracy:0.997191 
Area under the curve:0.519696

7、小结

通过三种模子的示意可知，随机丛林的误杀率最低;
不该只盯着精度，偶然辰模子的精度高并不能声名模子就好，出格是像本项目中这样的数据严峻不服衡的环境。举个例子，我们拿到有1000条病人的数据集，个中990工钱康健，10个有癌症，我们要通过建模找出这10个癌症病人，假如一个模子猜测到了所有康健的990人，而10个病人一个都没找到，此时其正确率如故有99%，但这个模子是无用的，并没有到达我们探求病人的目标;
建模说明时，碰着像本例这样的十分不服衡数据集，因采纳下采样、过采样等步伐，使数据均衡，这样的猜测才故意义，下一篇文章将针对这个题目举办改造;
模子、算法并没有坎坷、优劣之分，只是在差异的环境下有差异的施展而已，这点应正确的对待。

（编辑：湖南网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

7/7

首页

教你如何安装ghost xp	深度技术Ghost xp系统
ghost xp sp3电脑公司	8187无线网卡驱动,教您