Publication:

b-move: faster lossless approximate pattern matching in a run-length compressed index

Date

 
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid0000-0001-8517-0479
cris.virtual.orcid0000-0002-2244-1427
cris.virtual.orcid0000-0002-9994-8269
cris.virtualsource.departmentc4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7
cris.virtualsource.departmentcafc39a2-8610-45ab-befa-b3f04ef3481d
cris.virtualsource.department4a97cd9e-e619-4718-8c1e-9168bc19ef13
cris.virtualsource.orcidc4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7
cris.virtualsource.orcidcafc39a2-8610-45ab-befa-b3f04ef3481d
cris.virtualsource.orcid4a97cd9e-e619-4718-8c1e-9168bc19ef13
dc.contributor.authorDepuydt, Lore
dc.contributor.authorRenders, Luca
dc.contributor.authorVan de Vyver, Simon
dc.contributor.authorVeys, Lennart
dc.contributor.authorGagie, Travis
dc.contributor.authorFostier, Jan
dc.contributor.imecauthorDepuydt, Lore
dc.contributor.imecauthorRenders, Luca
dc.contributor.imecauthorFostier, Jan
dc.contributor.orcidimecDepuydt, Lore::0000-0001-8517-0479
dc.contributor.orcidimecRenders, Luca::0000-0002-2244-1427
dc.contributor.orcidimecFostier, Jan::0000-0002-9994-8269
dc.date.accessioned2025-08-23T03:59:02Z
dc.date.available2025-08-23T03:59:02Z
dc.date.issued2025-AUG 12
dc.description.abstractBackground Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.’s r-index and Nishimoto and Tabei’s move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.’s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. Results We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient, lossless approximate pattern matching in run-length compressed space. It achieves bidirectional character extensions up to 7 times faster than the br-index, closing the performance gap with FM-index-based alternatives. For locating occurrences, b-move performs and operations up to 7 times faster than the br-index. At the same time, it maintains the favorable memory characteristics of the br-index, for example, all available complete E. coli genomes on NCBI’s RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Conclusions b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license
dc.description.wosFundingTextThe authors thank Ben Langmead, Nathaniel Brown, and Mohsen Zakeri for their helpful feedback and suggestions.
dc.identifier.doi10.1186/s13015-025-00281-x
dc.identifier.issn1748-7188
dc.identifier.pmidMEDLINE:40796877
dc.identifier.urihttps://imec-publications.be/handle/20.500.12860/46099
dc.publisherBMC
dc.source.beginpage15
dc.source.issue1
dc.source.journalALGORITHMS FOR MOLECULAR BIOLOGY
dc.source.numberofpages19
dc.source.volume20
dc.subject.keywordsREAD ALIGNMENT
dc.title

b-move: faster lossless approximate pattern matching in a run-length compressed index

dc.typeJournal article
dspace.entity.typePublication
Files

Original bundle

Name:
s13015-025-00281-x.pdf
Size:
2.46 MB
Format:
Adobe Portable Document Format
Description:
Published
Publication available in collections: