数据集说明:数据集来自威斯康星州医院的699条乳腺肿瘤数据,每条数据包含以下内容:
- Sample code number\ \ \ \ \ \ \ \ \ \ \ \ \ \id number
 
- Clump Thickness\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 1 - 10
 
- Uniformity of Cell Size\ \ \ \ \ \ \ \ \ \ \ 1 - 10
 
- Uniformity of Cell Shape\ \ \ \ \ \ \ \ \ \  1 - 10
 
- Marginal Adhesion\ \ \ \ \ \ \ \ \ \ \ \ \ \ 1 - 10
 
- Single Epithelial Cell Size\ \ \ \ \ \ \ \ \ 1 - 10
 
- Bare Nuclei\ \ \ \ \ \ \ \ \ \ \ \ \ \  \ \  1 - 10
 
- Bland Chromatin\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 1 - 10
 
- Normal Nucleoli\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 1 - 10
- Mitoses \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  1 - 10
 
- Class: \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2 for benign, 4 for malignant)
 
 
程序说明:采用KNN算法、支持向量机SVM,由Python语言实现良性恶性肿瘤预测。</br>
算法理论请参照:KNN算法、支持向量机SVM</br>
Ipynb演示文件:Ipynb文件</br>
Python代码:Python代码</br>
1 2 3 4 5 6 7 8 9 10 11 12
   | '''获取并预处理原始数据集''' import pandas as pd df = pd.read_csv('breast-cancer-wisconsin.data.txt')
 
  df.replace('?', -99999,inplace=True) df.fillna(-99999, inplace=True)
 
  df.drop(['id'], 1, inplace=True)
  df.head()
   | 
 
1 2 3 4 5 6 7
   | '''将数据集划分为训练集及测试集合''' import numpy as np X = np.array(df.drop(['class'], 1)) y = np.array(df['class'])
  from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
   | 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
   | '''选择算法、训练算法并测试算法''' from sklearn import svm, neighbors cls_dict={     'SVM-SVC':svm.SVC(),     'KNN':neighbors.KNeighborsClassifier() }
 
  for name, cls in cls_dict.items():     try:         import pickle         with open('%s.pickle' % name, 'rb') as f:             cls = pickle.load(f)     except Exception, e:                  cls.fit(X_train, y_train)         print e
                   with open('%s.pickle' % name, 'wb') as f:             pickle.dump(cls, f)
           print "%s Algorithm Accuracy: %s" % (name, cls.score(X_test, y_test))
           samples = np.array([[4, 2, 1, 1, 1, 2, 3, 2, 1], [4, 2, 1, 2, 2, 2, 3, 2, 1]])     samples = samples.reshape(len(samples), -1)     prediction = cls.predict(samples)     print "%s Algorithm prediction: %s\n" % (name, prediction)
   |