Extract positions covered by a given read from Bam file
3
0
Entering edit mode
6.7 years ago

Hi!

How can I extract positions, which are covered by a given read? What I mean, is that If I use samtools view "bamfile_name" "chr_name":"pos"-"pos", I can extract all reads, which cover position "pos". But I need to the opposite - I need to get all positions, covered by a specified read, How can I do that?

From bam file I can extract left-most position of the aligned read, its length, specified in SEQ field and CIGAR string. Is it possible to say that covered positions for a given read are [let-most position:let-most position+SEQ-1]?

alignment • 4.3k views
ADD COMMENT
1
Entering edit mode
6.7 years ago

Parse the CIGAR string to get a list of the covered positions. Here's an example in C.

ADD COMMENT
0
Entering edit mode
6.7 years ago

use sam2tsv : http://lindenb.github.io/jvarkit/Sam2Tsv.html

$ java -jar dist/sam2tsv.jar -A  \
    -r samtools-0.1.18/examples/toy.fa 
      samtools-0.1.18/examples/toy.sam
r001    163 ref 0   T   .   7   T   M
r001    163 ref 1   T   .   8   T   M
r001    163 ref 2   A   .   9   A   M
r001    163 ref 3   G   .   10  G   M
r001    163 ref 4   A   .   11  A   M
r001    163 ref 5   T   .   12  T   M
r001    163 ref 6   A   .   13  A   M
r001    163 ref 7   A   .   14  A   M
ADD COMMENT
0
Entering edit mode

Thank you! But as far as I see, this will produce a too large tab-delimited file (approx Number of reads * average length of read), I would prefer to parse reads at once, updating every position in covers. For my purposes this can be faster. But I will use this to compare and verify my results, thank you

ADD REPLY
0
Entering edit mode

this will produce a too large tab-delimited fil

you can pipe the output:

samtools view -h your.bam  "chr1:123-456" | grep -E '(^@|YOURREADNAME)' | java -jar sam2tsv.jar
ADD REPLY
0
Entering edit mode
6.7 years ago

Okay, so its turned out that for my purposes it is better to used pysam Smth like

import pysam
bamFP = pysam.Samfile('file.bam', "rb");
for read in bamFP.fetch:
 read.get_reference_positions()

I compared that with my parser and with Tablet output and it seems like that it return correct output - including indels, clipping, etc

ADD COMMENT

Login before adding your answer.

Traffic: 2897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6