Question: Calculate the number of paired-end reads their middle size is greater than # KB
0
3.7 years ago by
United States

Dear All,

I have a BAM file of paired-end sequencing reads. I want to calculate how many paired-end reads that mapped to the same chromosomes their middle size (the number of base pairs) between the two ends of paired-end reads is greater than 100 kb, and how many paired-end reads that mapped to the different chromosomes their middle size (the number of base pairs) between the two ends of paired-end reads is greater than 100 kb. Can anybody help me get this done? Thank you very much in advance.

myposts next-gen • 1.3k views
modified 3.7 years ago by James Ashmore2.6k • written 3.7 years ago by lisadavic660
2
3.7 years ago by
James Ashmore2.6k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.6k wrote:

You can use a program called BEDTools to do this analysis. First convert your BAM to a BAMPE file, then use AWK to get the number of bases between the two pairs with a cutoff. For example:

```\$ bedtools bamtobed -i reads.bam -bedpe > reads.bedpe
chr7   118965072   118965122   chr7   118970079   118970129 TUPAC_0001:3:1:0:1452#0 37     +     -
\$ awk -v CUTOFF=500 -v OFS="\t" '\$5 - \$2 >= CUTOFF {print \$1, \$5 - \$2}' reads.bedpe > distances.txt
chr7 5007
```

Thank you so much for your help. I will try your script.

How can I calculate the numbers for all of chromosomes efficiently? and how about that mapped to different chromosomes? Thank you again.

You can cut the chromosome field out from the distances.txt file and then count how often each chromosome identifier appears. For example:

`cut -f1 distances.txt | sort | uniq -c`

You can easily discover the number of reads whose mate mapped to a different chromosome with

`samtools file.bam flagstat`