Semi-supervised constrained clustering: an in-depth overview, ranked taxonomy and future research directions

Gonzalez-Almagro, German; Peralta, Daniel; De Poorter, Eli; Cano, Jose-Ramon; Garcia, Salvador

doi:10.1007/s10462-024-11103-8

Simple item page Full metadata Statistics

dc.contributor.author	Gonzalez-Almagro, German
dc.contributor.author	Peralta, Daniel
dc.contributor.author	De Poorter, Eli
dc.contributor.author	Cano, Jose-Ramon
dc.contributor.author	Garcia, Salvador
dc.contributor.imecauthor	Peralta, Daniel
dc.contributor.imecauthor	De Poorter, Eli
dc.contributor.orcidimec	Peralta, Daniel::0000-0002-7544-8411
dc.contributor.orcidimec	De Poorter, Eli::0000-0002-0214-5751
dc.date.accessioned	2025-03-14T10:18:27Z
dc.date.available	2025-03-13T18:38:20Z
dc.date.available	2025-03-14T10:18:27Z
dc.date.issued	2025
dc.description.abstract	Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be used when expert knowledge is available to indicate constraints that can be exploited. Well-known examples of such constraints are must-link (indicating that two instances belong to the same group) and cannot-link (two instances definitely do not belong together). The research area of constrained clustering has grown significantly over the years with a large variety of new algorithms and more advanced types of constraints being proposed. However, no unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks. To remedy this, this study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering. In addition, it focuses on the instance-level pairwise constraints, and gives an overview of its applications and its historical context. Finally, it presents a statistical analysis covering 315 constrained clustering methods, categorizes them according to their features, and provides a ranking score indicating which methods have the most potential based on their popularity and validation quality. Finally, based upon this analysis, potential pitfalls and future research directions are provided.
dc.description.wosFundingText	This study has been funded by the research projects PID2020-119478GB-I00, A-TIC-434-UGR20 and PREDOC_01648.
dc.identifier.doi	10.1007/s10462-024-11103-8
dc.identifier.issn	0269-2821
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/45387
dc.publisher	SPRINGER
dc.source.beginpage	157
dc.source.issue	5
dc.source.journal	ARTIFICIAL INTELLIGENCE REVIEW
dc.source.numberofpages	127
dc.source.volume	58
dc.subject.keywords	NONNEGATIVE MATRIX FACTORIZATION
dc.subject.keywords	INSTANCE-LEVEL CONSTRAINTS
dc.subject.keywords	PAIRWISE CONSTRAINTS
dc.subject.keywords	GEOMETRICAL STRUCTURE
dc.subject.keywords	SIDE INFORMATION
dc.subject.keywords	ALGORITHM
dc.subject.keywords	MODEL
dc.subject.keywords	IMAGE
dc.subject.keywords	FRAMEWORK
dc.subject.keywords	SIMILARITY
dc.title	Semi-supervised constrained clustering: an in-depth overview, ranked taxonomy and future research directions
dc.type	Journal article
dspace.entity.type	Publication
Files	Original bundle Name: 8769.pdf Size: 14.63 MB Format: Adobe Portable Document Format Description: Published Download
Publication available in collections:	Articles

Semi-supervised constrained clustering: an in-depth overview, ranked taxonomy and future research directions

Date