Question

Extract positions covered by a given read from Bam file

0

Entering edit mode

8.0 years ago

kolchenkosergey • 0

Hi!

How can I extract positions, which are covered by a given read? What I mean, is that If I use samtools view "bamfile_name" "chr_name":"pos"-"pos", I can extract all reads, which cover position "pos". But I need to the opposite - I need to get all positions, covered by a specified read, How can I do that?

From bam file I can extract left-most position of the aligned read, its length, specified in SEQ field and CIGAR string. Is it possible to say that covered positions for a given read are [let-most position:let-most position+SEQ-1]?

alignment • 5.0k views

ADD COMMENT • link 8.0 years ago by kolchenkosergey • 0

score 1 · Answer 1 · 2017-07-27

1

Entering edit mode

8.0 years ago

Devon Ryan 105k

Parse the CIGAR string to get a list of the covered positions. Here's an example in C.

ADD COMMENT • link 8.0 years ago by Devon Ryan 105k

score 0 · Answer 2 · 2017-07-27

0

Entering edit mode

8.0 years ago

Pierre Lindenbaum 166k

use sam2tsv : http://lindenb.github.io/jvarkit/Sam2Tsv.html

$ java -jar dist/sam2tsv.jar -A  \
    -r samtools-0.1.18/examples/toy.fa 
      samtools-0.1.18/examples/toy.sam
r001    163 ref 0   T   .   7   T   M
r001    163 ref 1   T   .   8   T   M
r001    163 ref 2   A   .   9   A   M
r001    163 ref 3   G   .   10  G   M
r001    163 ref 4   A   .   11  A   M
r001    163 ref 5   T   .   12  T   M
r001    163 ref 6   A   .   13  A   M
r001    163 ref 7   A   .   14  A   M

ADD COMMENT • link 8.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thank you! But as far as I see, this will produce a too large tab-delimited file (approx Number of reads * average length of read), I would prefer to parse reads at once, updating every position in covers. For my purposes this can be faster. But I will use this to compare and verify my results, thank you

ADD REPLY • link 8.0 years ago by kolchenkosergey • 0

0

Entering edit mode

this will produce a too large tab-delimited fil

you can pipe the output:

samtools view -h your.bam  "chr1:123-456" | grep -E '(^@|YOURREADNAME)' | java -jar sam2tsv.jar

ADD REPLY • link 8.0 years ago by Pierre Lindenbaum 166k

score 0 · Answer 3 · 2017-07-27

Okay, so its turned out that for my purposes it is better to used pysam Smth like

import pysam
bamFP = pysam.Samfile('file.bam', "rb");
for read in bamFP.fetch:
 read.get_reference_positions()

I compared that with my parser and with Tablet output and it seems like that it return correct output - including indels, clipping, etc