is the processes of finding a model or function the describes and distinguishes
data classes or concepts for the purpose of being to use the model to predict
the class of object whose class label is unknown. Many researchers have been
applying various algorithms to help health care professionals with improved
accuracy in diagnosis of breast cancer. 5 Most have a paper using common
algorithm such as decision tree , naive bayes
, KNN , Neural Network and so on.
Decision tree , it’s a structure that include root node , branches , and leaf
nodes . Each nodes denotes a test on an attribute , each branch denotes the
outcomes of a test and each leaf nodes holds a class label. There are using
different algorithm which are ID3, C4.5 and c5.0. The C5.0 algorithm is a
decision tree that recursively separates observations in branches to construct
a tree for the purpose of improving the prediction accuracy.9
classifier is tested first to classify unseen data and for this purpose
resulting decision tree is used. C4.5 algorithm follows the rules of ID3
algorithm. Similarly C5 algorithm follows the rules of algorithm of C4.5. C5
algorithm has many features like:
The large decision tree can be viewing
as a set of rules which is easy to understand.
C5 algorithm gives the acknowledge on
noise and missing data.
Problem of over fitting and error
pruning is solved by the C5 algorithm.
In classification technique the C5
classifier can anticipate which attributes are relevant and which are not
relevant in classification.10
KNN(K Nearest Neighbor) ,object is classified by a majority vote of its
neighbors with the object being assigned to the class most common among K
Naïve bayes ,It is a quick method for creation of statistical predictive
models. NB is based on the Bayesian theorem. These classification techniques
analyses the relationship between each attribute and the class
for each instance to derive a conditional probability for the relationship between the attribute
values and the class. The probability of each class is a computed by counting how many times its occurs in the
dataset. This is called ” prior probability ” P(C=c).