Publication:

Run-length compressed metagenomic read classification with SMEM-finding and tagging

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0001-8517-0479
cris.virtual.orcid0000-0002-9994-8269
cris.virtualsource.departmentc4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7
cris.virtualsource.department4a97cd9e-e619-4718-8c1e-9168bc19ef13
cris.virtualsource.orcidc4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7
cris.virtualsource.orcid4a97cd9e-e619-4718-8c1e-9168bc19ef13
dc.contributor.authorDepuydt, Lore
dc.contributor.authorAhmed, Omar Y.
dc.contributor.authorFostier, Jan
dc.contributor.authorLangmead, Ben
dc.contributor.authorGagie, Travis
dc.date.accessioned2026-06-15T12:02:13Z
dc.date.available2026-06-15T12:02:13Z
dc.date.createdwos2025-12-28
dc.date.issued2025
dc.description.abstractMetagenomic read classification is a fundamental task in computational biology but remains challenging due to the scale and diversity of sequencing data. We present a run-length compressed BWT-based index using the move structure for efficient multi-class classification. Our method finds all super-maximal exact matches (SMEMs) of length ≥ L between a read and a reference and associates each SMEM with one class identifier using a sampled tag array. A consensus algorithm then compacts these SMEMs and their class identifiers into a single classification. We are the first to perform run-length compressed read classification using full rather than semi-SMEMs. We evaluated on long and short reads across two datasets: a large bacterial pan-genome with few classes and a smaller 16S rRNA gene database spanning thousands of genera. Our method outperforms SPUMONI 2 in accuracy and runtime while maintaining run-length compressed memory complexity and surpasses Cliffy in memory efficiency with comparable accuracy.
dc.description.wosFundingTextLore Depuydt was funded by a PhD Fellowship FR (1117322N) , Research Foundation-Flanders (FWO) . Travis Gagie was funded by NSERC grant 07185-2020. Ben Langmead and Omar Ahmed were funded by NSF grant DBI-2029552 and NIH grant R01HG011392.
dc.identifier.doi10.1016/j.isci.2025.114029
dc.identifier.issn2589-0042
dc.identifier.pmidMEDLINE:41497396
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/59691
dc.language.isoeng
dc.provenance.editstepusergreet.vanhoof@imec.be
dc.publisherCELL PRESS
dc.source.beginpage114029
dc.source.issue12
dc.source.journalISCIENCE
dc.source.numberofpages16
dc.source.volume28
dc.title

Run-length compressed metagenomic read classification with SMEM-finding and tagging

dc.typeJournal article
dspace.entity.typePublication
imec.internal.crawledAt2026-04-07
imec.internal.sourcecrawler
imec.internal.wosCreatedAt2026-04-07
Files

Original bundle

Name:
1-s2.0-S2589004225022904-main.pdf
Size:
2.8 MB
Format:
Adobe Portable Document Format
Description:
Published
Publication available in collections: