A Cluster Mining is a group of objects that belong to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster.”
- A cluster of data objects can be treated as one group.
- Clustering is a process of partitioning a set of data (or objects) into a set of meaningful sub-classes, called clusters.
- Also, While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups.
- Moreover, The main advantage of clustering over classification is that it is adaptable to changes and helps single out useful features that distinguish different groups.
Requirements for Clustering in Data Mining: Cluster Mining
Scalability – We need highly scalable clustering algorithms to deal with large databases.
Ability to deal with different kinds of attributes − Algorithms should capable to applied to any kind of data such as interval-based (numerical) data, categorical, and binary data.
Discovery of clusters with attribute shape − the clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not bound to only distance measures that tend to find the spherical cluster of small sizes.
High dimensionality − the clustering algorithm should not only be able to handle low dimensional data but also the high dimensional space.
Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.
Interpretability − the clustering results should be interpretable, comprehensible, and usable.
Applications of Clustering: Cluster Mining
Economic Science (especially market research).
- Document classification,
- Cluster Weblog data to discover groups of similar access patterns
Also, Pattern Recognition.
Spatial Data Analysis: Create thematic maps in GIS by clustering feature spaces
Also, Image Processing