Columba: fast approximate pattern matching with optimized search schemes

Renders, Luca; Depuydt, Lore; Gagie, Travis; Fostier, Jan

doi:10.1093/bioinformatics/btaf652

Simple item page Full metadata Statistics

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	0000-0001-8517-0479
cris.virtual.orcid	0000-0002-2244-1427
cris.virtual.orcid	0000-0002-9994-8269
cris.virtualsource.department	c4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7
cris.virtualsource.department	cafc39a2-8610-45ab-befa-b3f04ef3481d
cris.virtualsource.department	4a97cd9e-e619-4718-8c1e-9168bc19ef13
cris.virtualsource.orcid	c4cd9ad3-10fc-4ea8-b7f3-19c50b10d7a7
cris.virtualsource.orcid	cafc39a2-8610-45ab-befa-b3f04ef3481d
cris.virtualsource.orcid	4a97cd9e-e619-4718-8c1e-9168bc19ef13
dc.contributor.author	Renders, Luca
dc.contributor.author	Depuydt, Lore
dc.contributor.author	Gagie, Travis
dc.contributor.author	Fostier, Jan
dc.contributor.orcidext	0000-0002-2244-1427
dc.contributor.orcidext	0000-0003-3689-327X
dc.contributor.orcidext	0000-0002-9994-8269
dc.date.accessioned	2026-04-01T07:19:21Z
dc.date.accessioned	2026-05-28T13:04:54Z
dc.date.available	2026-04-01T07:19:21Z
dc.date.createdwos	2025-12-28
dc.date.issued	2025
dc.description.abstract	Motivation Aligning sequencing reads to reference genomes is a fundamental task in bioinformatics. Aligners can be classified as lossy or lossless: lossy aligners prioritize speed by reporting only one or a few high-scoring alignments, whereas lossless aligners output all optimal alignments, ensuring completeness and sensitivity. Results This paper introduces Columba, a high-performance lossless aligner tailored for Illumina sequencing data. Columba processes single or paired-end reads in FASTQ format and outputs alignments in SAM format. By utilizing advanced search schemes and bit-parallel alignment techniques, Columba achieves exceptional speed. Columba is available in two variants. The first, based on the bidirectional FM-index, prioritizes speed. The second, Columba RLC, uses run-length compression using a bidirectional move structure, significantly reducing memory usage for large, repetitive datasets like pan-genomes. Benchmarks on the human genome, as well as bacterial and human pan-genome datasets, demonstrate that Columba is much faster than existing lossless aligners and even competitive with lossy tools. We integrated Columba into the OptiType HLA genotyping pipeline, where it substantially reduced computational time while maintaining accuracy. These results position Columba as a versatile, state-of-the-art tool for high-sensitivity genomic analyses. Availability and implementation The source code of Columba is available at https://github.com/biointec/columba under AGPL license. Scripts to reproduce the benchmarks and analyses are available at https://doi.org/10.5281/zenodo.15849246.
dc.description.wosFundingText	This work was supported by the Research Foundation-Flanders (FWO) through a PhD Fellowship SB [1SE7822N to L.R.] and a PhD Fellowship FR [1117322N to L.D.].
dc.identifier.doi	10.1093/bioinformatics/btaf652
dc.identifier.eissn	1367-4811
dc.identifier.issn	1367-4803
dc.identifier.pmid	MEDLINE:41349000
dc.identifier.uri	https://imec-publications.be/handle/20.500.12860/58991
dc.language.iso	eng
dc.provenance.editstepuser	greet.vanhoof@imec.be
dc.publisher	OXFORD UNIV PRESS
dc.relation.ispartof	BIOINFORMATICS
dc.relation.ispartofseries	BIOINFORMATICS
dc.source.beginpage	btaf652
dc.source.issue	12
dc.source.journal	BIOINFORMATICS
dc.source.numberofpages	8
dc.source.volume	41
dc.subject	READ ALIGNMENT
dc.subject	ACCURATE
dc.subject	Science & Technology
dc.subject	Life Sciences & Biomedicine
dc.subject	Technology
dc.subject	Physical Sciences
dc.subject.keywords	READ ALIGNMENT
dc.subject.keywords	ACCURATE
dc.title	Columba: fast approximate pattern matching with optimized search schemes
dc.type	Journal article
dspace.entity.type	Publication
imec.internal.source	crawler
imec.internal.wosCreatedAt	2026-04-07
oaire.citation.edition	WOS.SCI
oaire.citation.issue	12
oaire.citation.volume	41
Files	Original bundle Name: btaf652.pdf Size: 1.21 MB Format: Adobe Portable Document Format Description: Published Download
Publication available in collections:	Articles

Columba: fast approximate pattern matching with optimized search schemes

Date