Publication:
Run-length compressed metagenomic read classification with SMEM-finding and tagging
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| cris.virtual.orcid | 0000-0001-8517-0479 | |
| cris.virtual.orcid | 0000-0002-9994-8269 | |
| cris.virtualsource.department | c4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7 | |
| cris.virtualsource.department | 4a97cd9e-e619-4718-8c1e-9168bc19ef13 | |
| cris.virtualsource.orcid | c4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7 | |
| cris.virtualsource.orcid | 4a97cd9e-e619-4718-8c1e-9168bc19ef13 | |
| dc.contributor.author | Depuydt, Lore | |
| dc.contributor.author | Ahmed, Omar Y. | |
| dc.contributor.author | Fostier, Jan | |
| dc.contributor.author | Langmead, Ben | |
| dc.contributor.author | Gagie, Travis | |
| dc.date.accessioned | 2026-06-15T12:02:13Z | |
| dc.date.available | 2026-06-15T12:02:13Z | |
| dc.date.createdwos | 2025-12-28 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Metagenomic read classification is a fundamental task in computational biology but remains challenging due to the scale and diversity of sequencing data. We present a run-length compressed BWT-based index using the move structure for efficient multi-class classification. Our method finds all super-maximal exact matches (SMEMs) of length ≥ L between a read and a reference and associates each SMEM with one class identifier using a sampled tag array. A consensus algorithm then compacts these SMEMs and their class identifiers into a single classification. We are the first to perform run-length compressed read classification using full rather than semi-SMEMs. We evaluated on long and short reads across two datasets: a large bacterial pan-genome with few classes and a smaller 16S rRNA gene database spanning thousands of genera. Our method outperforms SPUMONI 2 in accuracy and runtime while maintaining run-length compressed memory complexity and surpasses Cliffy in memory efficiency with comparable accuracy. | |
| dc.description.wosFundingText | Lore Depuydt was funded by a PhD Fellowship FR (1117322N) , Research Foundation-Flanders (FWO) . Travis Gagie was funded by NSERC grant 07185-2020. Ben Langmead and Omar Ahmed were funded by NSF grant DBI-2029552 and NIH grant R01HG011392. | |
| dc.identifier.doi | 10.1016/j.isci.2025.114029 | |
| dc.identifier.issn | 2589-0042 | |
| dc.identifier.pmid | MEDLINE:41497396 | |
| dc.identifier.uri | https://imec-publications.be/handle/20.500.12860/59691 | |
| dc.language.iso | eng | |
| dc.provenance.editstepuser | greet.vanhoof@imec.be | |
| dc.publisher | CELL PRESS | |
| dc.source.beginpage | 114029 | |
| dc.source.issue | 12 | |
| dc.source.journal | ISCIENCE | |
| dc.source.numberofpages | 16 | |
| dc.source.volume | 28 | |
| dc.title | Run-length compressed metagenomic read classification with SMEM-finding and tagging | |
| dc.type | Journal article | |
| dspace.entity.type | Publication | |
| imec.internal.crawledAt | 2026-04-07 | |
| imec.internal.source | crawler | |
| imec.internal.wosCreatedAt | 2026-04-07 | |
| Files | Original bundle
| |
| Publication available in collections: |