Classification

is the processes of finding a model or function the describes and distinguishes

data classes or concepts for the purpose of being to use the model to predict

the class of object whose class label is unknown. Many researchers have been

applying various algorithms to help health care professionals with improved

accuracy in diagnosis of breast cancer. 5 Most have a paper using common

algorithm such as decision tree , naive bayes

, KNN , Neural Network and so on.

In

Decision tree , it’s a structure that include root node , branches , and leaf

nodes . Each nodes denotes a test on an attribute , each branch denotes the

outcomes of a test and each leaf nodes holds a class label. There are using

different algorithm which are ID3, C4.5 and c5.0. The C5.0 algorithm is a

decision tree that recursively separates observations in branches to construct

a tree for the purpose of improving the prediction accuracy.9

The

classifier is tested first to classify unseen data and for this purpose

resulting decision tree is used. C4.5 algorithm follows the rules of ID3

algorithm. Similarly C5 algorithm follows the rules of algorithm of C4.5. C5

algorithm has many features like:

·

The large decision tree can be viewing

as a set of rules which is easy to understand.

·

C5 algorithm gives the acknowledge on

noise and missing data.

·

Problem of over fitting and error

pruning is solved by the C5 algorithm.

·

In classification technique the C5

classifier can anticipate which attributes are relevant and which are not

relevant in classification.10

In

KNN(K Nearest Neighbor) ,object is classified by a majority vote of its

neighbors with the object being assigned to the class most common among K

nearest neighbors.

In

Naïve bayes ,It is a quick method for creation of statistical predictive

models. NB is based on the Bayesian theorem. These classification techniques

analyses the relationship between each attribute and the class

for each instance to derive a conditional probability for the relationship between the attribute

values and the class. The probability of each class is a computed by counting how many times its occurs in the

dataset. This is called ” prior probability ” P(C=c).