Publication:

Semi-supervised constrained clustering: an in-depth overview, ranked taxonomy and future research directions

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0002-7544-8411
cris.virtual.orcid0000-0002-0214-5751
cris.virtualsource.departmentb9ef8f95-3a50-488d-a20f-e0478e721aa0
cris.virtualsource.departmenteb7ed649-7114-4ead-84d3-05a804e8fb45
cris.virtualsource.orcidb9ef8f95-3a50-488d-a20f-e0478e721aa0
cris.virtualsource.orcideb7ed649-7114-4ead-84d3-05a804e8fb45
dc.contributor.authorGonzalez-Almagro, German
dc.contributor.authorPeralta, Daniel
dc.contributor.authorDe Poorter, Eli
dc.contributor.authorCano, Jose-Ramon
dc.contributor.authorGarcia, Salvador
dc.contributor.imecauthorPeralta, Daniel
dc.contributor.imecauthorDe Poorter, Eli
dc.contributor.orcidimecPeralta, Daniel::0000-0002-7544-8411
dc.contributor.orcidimecDe Poorter, Eli::0000-0002-0214-5751
dc.date.accessioned2025-03-14T10:18:27Z
dc.date.available2025-03-13T18:38:20Z
dc.date.available2025-03-14T10:18:27Z
dc.date.issued2025
dc.description.abstractClustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be used when expert knowledge is available to indicate constraints that can be exploited. Well-known examples of such constraints are must-link (indicating that two instances belong to the same group) and cannot-link (two instances definitely do not belong together). The research area of constrained clustering has grown significantly over the years with a large variety of new algorithms and more advanced types of constraints being proposed. However, no unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks. To remedy this, this study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering. In addition, it focuses on the instance-level pairwise constraints, and gives an overview of its applications and its historical context. Finally, it presents a statistical analysis covering 315 constrained clustering methods, categorizes them according to their features, and provides a ranking score indicating which methods have the most potential based on their popularity and validation quality. Finally, based upon this analysis, potential pitfalls and future research directions are provided.
dc.description.wosFundingTextThis study has been funded by the research projects PID2020-119478GB-I00, A-TIC-434-UGR20 and PREDOC_01648.
dc.identifier.doi10.1007/s10462-024-11103-8
dc.identifier.issn0269-2821
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/45387
dc.publisherSPRINGER
dc.source.beginpage157
dc.source.issue5
dc.source.journalARTIFICIAL INTELLIGENCE REVIEW
dc.source.numberofpages127
dc.source.volume58
dc.subject.keywordsNONNEGATIVE MATRIX FACTORIZATION
dc.subject.keywordsINSTANCE-LEVEL CONSTRAINTS
dc.subject.keywordsPAIRWISE CONSTRAINTS
dc.subject.keywordsGEOMETRICAL STRUCTURE
dc.subject.keywordsSIDE INFORMATION
dc.subject.keywordsALGORITHM
dc.subject.keywordsMODEL
dc.subject.keywordsIMAGE
dc.subject.keywordsFRAMEWORK
dc.subject.keywordsSIMILARITY
dc.title

Semi-supervised constrained clustering: an in-depth overview, ranked taxonomy and future research directions

dc.typeJournal article
dspace.entity.typePublication
Files

Original bundle

Name:
8769.pdf
Size:
14.63 MB
Format:
Adobe Portable Document Format
Description:
Published
Publication available in collections: