Querying Bam Files With Samtools/Awk
2
2
Entering edit mode
13.0 years ago
Jneary1978 ▴ 60

Hello,

I am trying to generate four column files for methylation analysis from my NGS alignments against the human genome (single and paired-end). The file needs to contain the following information for any read that aligns:

1 - chr#
2 - start position of alignment 
3 - stop position of alignment 
4 - strand (+/-)

And nothing else.

Currently I am trying to awk the bam files to obtain the info, and it seems to work well enough except for strand info. I am either obtaining all positive or all negative strand reads, even though I can see I have alignments to both. I don't know if this is a problem with my awk command or what. If I recall correctly, the PE reads all came out positive strand and the single reads in another run all came out negative. (possibly it was the reverse pattern)

So I have two questions. First, is my awk command roughly correct? Do you have better suggestions or any idea what I am doing wrong. Second, can you think of any other way to generate this 4 column file? (comma separated preferably)

Here is the awk for the PE reads:

samtools view -f 0x0002 -X yourbamfilehere | gawk '{ printf "%s,%s,%s,%s\n", $3, $4, $4 + length($10), (index($2,"r") == 0) }' > yourcsvfilenamehere

Thanks so much! JN

awk samtools illumina bam • 6.9k views
ADD COMMENT
2
Entering edit mode
13.0 years ago
brentp 24k

You can simplify (or change) your awk to something like:

samtools view -f 0x02  /home/brentp/work/samtools-snp-test/data/F3-R3-Paired.bam \
  | awk '{ if ( and($2, 16) == 0 ) { strand="+" } 
            else { strand="-" }; 
         print $3,$4,$4 + length($10),strand 
  }'

since awk supports and as a bitwise operation. Also, if you pre-filter your BAM to only contain proper pairs, then you can use bedtools bamToBed for this.

ADD COMMENT
0
Entering edit mode

Thank you so much!

ADD REPLY
1
Entering edit mode
13.0 years ago
Farhat ★ 2.9k

The awk looks alright except for (index($2,"r") == 0). You could try changing that to and($2, 0x10) to get whether the read is reverse complemented or not.

ADD COMMENT
0
Entering edit mode

Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 1799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6