Extracting Paired-End Reads Sitting In Different Chromosomes
3
5
Entering edit mode
9.6 years ago
Ian 5.7k

I have been mapping paired-end reads (SOLiD) and would like to extract mapped pairs that are located on different chromosomes from the SAM/BAM file. `

samtools flagstat actually reports the number of pairs on different chromosomes. Are there any simple methods of extracting them?

Thank you!

samtools paired • 14k views
ADD COMMENT
11
Entering edit mode
9.6 years ago

export your bam as SAM and use awk to filter the result (something like

awk '($3!=$7 && $7!="=")'

). The SAM specification ( samtools.sourceforge.net/SAM1.pdf ) says that the column $3 is the CHROMOSOME and the $7 is the chromosome for the mate

ADD COMMENT
0
Entering edit mode

Thanks - i played around with awk and regex to get what i wanted.

ADD REPLY
4
Entering edit mode
9.6 years ago

Here is a simple method:

samtools view -F 14 infile.bam | grep -v " = " > outfile.sam

The -F 14 gives you reads that are

  • are mapped
  • have a mate that is mapped
  • but are not mapped in a proper pair

Note that there are TAB characters left and right of the = sign. Try Ctrl-V [HTML] to enter them at the terminal.

ADD COMMENT
1
Entering edit mode

Caveat - this will include INTRA-chromosomal discordant pairs as well.

ADD REPLY
0
Entering edit mode

No, since identical RNAME and RNEXT are excluded. Or am I missing something?

ADD REPLY
0
Entering edit mode

I noticed -F, but if this relates to the second column of the SAM file, then it does not tally with the flags i have, e.g. 97 and 145 for the 50bp and 35bp reads respectively.

ADD REPLY
0
Entering edit mode

-F 14 will not remove reads with flag 97 or 145. Have a look at http://picard.sourceforge.net/explain-flags.html

ADD REPLY
0
Entering edit mode

A version that allows the reversion to bam includes the header:

samtools view -F 14 -h infile.bam | grep -v " = " > outfile.sam

reversion to bam:

samtools view -b outfile.sam > finalout.bam
ADD REPLY
2
Entering edit mode
3.2 years ago

can be shortened to (including mapQ>5):

samtools view -F 14 -q 5 -h input.sam \
    | awk '$7 !~ /=/' \
    | samtools view -Sb - \
    > map_2_diff_chroms.bam
ADD COMMENT

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6