Htslib c++ library faidx fetch region
0
0
Entering edit mode
3.5 years ago
rah ▴ 20

I'm coding in c++ and reading in different reference genomes to examine regions across the chromosomes a few hundred base pairs at a time.

To do it in c++ I use the library htslib and the command faidx_fetch_seq() which is similar to read specific regions using samtools faidx.

The problem is when I'm reading in the reference genomes, there are of course multiple regions consisting only of NNNNN (and so on). Since I only need the actual DNA sequence I wish to read in the entire file but removing the regions with NNNN.

I have tried reading in the regions in c++ and then just creating an if-statement to check the actual bases consist of N. This works but is incredibly slow.

So my question is: Does anyone know how to only read in the actual DNA sequences, removing the NNN regions? Either with samtools or with the c++ function faidx_fetch_seq().

samtools cpp faidx htslib • 765 views
ADD COMMENT

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6