python - What are noisy samples in Scikit's DBSCAN clustering algorithm? -
if apply scikit's dbscan (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.dbscan.html) on similarity matrix, series of labels back. of these labels -1. documentation calls them noisy samples.
what these? belong single cluster, or each belong own cluster since they're noisy?
thank you
these not part of cluster. points not belong clusters , can "ignored" extent.
remember, dbscan stands "density-based spatial clustering of applications noise." dbscan checks make sure point has enough neighbors within specified range classify points clusters.
but happens points not meet criteria falling of main clusters? if point not have enough neighbors within specified radius considered part of cluster? these points given cluster label of -1
, considered noise.
so what?
well, if analyzing data points , interested in general clusters, lower size of data , cut out noise. or, if using cluster analysis classify data, in cases possible discard noise outliers.
in anomaly detection, points not fit category significant, can represent problem or rare event.
Comments
Post a Comment