Calculate the number of paired-end reads their middle size is greater than # KB
1
0
Entering edit mode
8.6 years ago

Dear All,

I have a BAM file of paired-end sequencing reads. I want to calculate how many paired-end reads that mapped to the same chromosomes their middle size (the number of base pairs) between the two ends of paired-end reads is greater than 100 kb, and how many paired-end reads that mapped to the different chromosomes their middle size (the number of base pairs) between the two ends of paired-end reads is greater than 100 kb. Can anybody help me get this done? Thank you very much in advance.

next-gen • 2.3k views
ADD COMMENT
2
Entering edit mode
8.6 years ago
James Ashmore ★ 3.4k

You can use a program called BEDTools to do this analysis. First convert your BAM to a BAMPE file, then use AWK to get the number of bases between the two pairs with a cutoff. For example:

$ bedtools bamtobed -i reads.bam -bedpe > reads.bedpe
chr7   118965072   118965122   chr7   118970079   118970129 TUPAC_0001:3:1:0:1452#0 37     +     -
$ awk -v CUTOFF=500 -v OFS="\t" '$5 - $2 >= CUTOFF {print $1, $5 - $2}' reads.bedpe > distances.txt
chr7 5007
ADD COMMENT
0
Entering edit mode

Thank you so much for your help. I will try your script.

How can I calculate the numbers for all of chromosomes efficiently? and how about that mapped to different chromosomes? Thank you again.

ADD REPLY
0
Entering edit mode

You can cut the chromosome field out from the distances.txt file and then count how often each chromosome identifier appears. For example:

cut -f1 distances.txt | sort | uniq -c
ADD REPLY
0
Entering edit mode

You can easily discover the number of reads whose mate mapped to a different chromosome with

samtools file.bam flagstat
ADD REPLY

Login before adding your answer.

Traffic: 2507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6