6.2 随机丛林模子
- from sklearn.ensemble import RandomForestClassifier
- rfmodel=RandomForestClassifier()
- rfmodel.fit(x_train,y_train)
- #查察模子
- print('rfmodel')
- rfmodel
- rfmodel
- RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
- max_depth=None, max_features='auto', max_leaf_nodes=None,
- min_impurity_decrease=0.0, min_impurity_split=None,
- min_samples_leaf=1, min_samples_split=2,
- min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
- oob_score=False, random_state=None, verbose=0,
- warm_start=False)
- #查察夹杂矩阵
- ypred_rf=rfmodel.predict(x_test)
- print('confusion_matrix')
- print(metrics.confusion_matrix(y_test,ypred_rf))
- confusion_matrix
- [[85291 4]
- [ 34 114]]
- #查察分类陈诉
- print('classification_report')
- print(metrics.classification_report(y_test,ypred_rf))
- classification_report
- precision recall f1-score support
- 0 1.00 1.00 1.00 85295
- 1 0.97 0.77 0.86 148
- avg / total 1.00 1.00 1.00 85443
- #查察猜测精度与决定包围面
- print('Accuracy:%f'%(metrics.accuracy_score(y_test,ypred_rf)))
- print('Area under the curve:%f'%(metrics.roc_auc_score(y_test,ypred_rf)))
- Accuracy:0.999625
- Area under the curve:0.902009
6.3支持向量机SVM
- # SVM分类
- from sklearn.svm import SVC
- svcmodel=SVC(kernel='sigmoid')
- svcmodel.fit(x_train,y_train)
- #查察模子
- print('svcmodel')
- svcmodel
- SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
- decision_function_shape='ovr', degree=3, gamma='auto', kernel='sigmoid',
- max_iter=-1, probability=False, random_state=None, shrinking=True,
- tol=0.001, verbose=False)
- #查察夹杂矩阵
- ypred_svc=svcmodel.predict(x_test)
- print('confusion_matrix')
- print(metrics.confusion_matrix(y_test,ypred_svc))
- confusion_matrix
- [[85197 98]
- [ 142 6]]
- #查察分类陈诉
- print('classification_report')
- print(metrics.classification_report(y_test,ypred_svc))
- classification_report
- precision recall f1-score support
- 0 1.00 1.00 1.00 85295
- 1 0.06 0.04 0.05 148
- avg / total 1.00 1.00 1.00 85443
- #查察猜测精度与决定包围面
- print('Accuracy:%f'%(metrics.accuracy_score(y_test,ypred_svc)))
- print('Area under the curve:%f'%(metrics.roc_auc_score(y_test,ypred_svc)))
- Accuracy:0.997191
- Area under the curve:0.519696
7、小结
- 通过三种模子的示意可知,随机丛林的误杀率最低;
- 不该只盯着精度,偶然辰模子的精度高并不能声名模子就好,出格是像本项目中这样的数据严峻不服衡的环境。举个例子,我们拿到有1000条病人的数据集,个中990工钱康健,10个有癌症,我们要通过建模找出这10个癌症病人,假如一个模子猜测到了所有康健的990人,而10个病人一个都没找到,此时其正确率如故有99%,但这个模子是无用的,并没有到达我们探求病人的目标;
- 建模说明时,碰着像本例这样的十分不服衡数据集,因采纳下采样、过采样等步伐,使数据均衡,这样的猜测才故意义,下一篇文章将针对这个题目举办改造;
- 模子、算法并没有坎坷、优劣之分,只是在差异的环境下有差异的施展而已,这点应正确的对待。
(编辑:湖南网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|