2016年3月1日 星期二


OpenCV comes with a data file, letter-recognition.data in opencv/samples/cpp/ folder. If you open it, you will see 20000 lines which may, on first sight, look like garbage. Actually, in each row, first column is an alphabet which is our label. Next 16 numbers following it are its different features. These features are obtained from UCI Machine Learning Repository. You can find the details of these features in this page.

它先把   letter-recognition.data  讀近來

然後垂直分成兩半....上半部拿來當作training pattern

                               下半部拿來當作test pattern

第一個column 是label.....  後面16 column 是 feature point....

這兩行就是再做這件事
responses, trainData = np.hsplit(train,[1])
labels, testData = np.hsplit(test,[1])

這行就是在train pattern....  trainData   ...  responses 就是辨認的label
knn.train(trainData, cv2.ml.ROW_SAMPLE, responses)

這行就是在測試pattern....把最後辨識出來的label 放到  result 去
ret, result, neighbours, dist = knn.findNearest(testData, k=5)

這行就是在比較.....
correct = np.count_nonzero(result == labels)

source code : 如下

import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')


import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load the data, converters convert the letter to a number
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
                    converters= {0: lambda ch: ord(ch)-ord('A')})

# split the data to two, 10000 each for train and test
train, test = np.vsplit(data,2)

# split trainData and testData to features and responses
responses, trainData = np.hsplit(train,[1])
labels, testData = np.hsplit(test,[1])


# Initiate the kNN, classify, measure accuracy.
#knn = cv2.ml.KNearest()
knn = cv2.ml.KNearest_create()
knn.train(trainData, cv2.ml.ROW_SAMPLE, responses)
ret, result, neighbours, dist = knn.findNearest(testData, k=5)


correct = np.count_nonzero(result == labels)
accuracy = correct*100.0/10000
print accuracy


Reference : http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_ml/py_knn/py_knn_opencv/py_knn_opencv.html#knn-opencv

沒有留言:

張貼留言