Question: No proper pairs for some files upon filtering using Bamtools
gravatar for dolores
4.2 years ago by
dolores0 wrote:

Hi everyone, I got really confused by some of the observations with my current data set (my second set of ChIP-Seq ever). After paired-end sequencing, I got 32 fastq.gz files for 16 samples. I use trimmomatic paired-end mode to trim illumina adaptors. Then, I ran BWA-MEM with the default parameters, using the paired reads from trimmomatic output. I then filtered for mapping quality > 5 and "IsProperPair" using Bamtools. Here's the problem: 5 out of 16 samples returned extremely small files. I ran Samtools "Stats" and found that while these files had lots of mapped reads, there were 0 proper pairs. I triple checked the input files and they were all matching paired reads.

Since my experience in ChIP-seq analysis is very limited, it'd be very helpful if someone can enlighten me on the cause of this problem and whether I can still use the alignment files without filtering for proper pairs. Thank you very much!

chip-seq alignment • 1.1k views
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by dolores0

It's important to find out why these weren't marked as proper pairs. Namely, was it due to having the wrong relative orientation, or wrong fragment size, or being on different chromosomes, or something else? You can likely discern this by quickly looking at a few of the alignments and guesstimating from that. If it's just a matter of insert size and the observed insert size isn't too out there, then I'd say forget worrying about proper pairs. If, however, there's a different underlying reason behind the metrics then you have cause for concern.

ADD REPLYlink written 4.2 years ago by Devon Ryan98k

Hi! Thank you for the advice. I just looked at the samtools stats and found that the reads indeed had wrong orientations, and were mapped to different chromosomes.

Example of stats output:
 - SN   average length: 99 
 - SN   maximum length: 126
 -  SN  average quality:    36.7
 - SN   insert size average:    199.2 
 - SN   insert size standard deviation: 1211.6 
 - SN   inward oriented pairs:  13300
 - SN   outward oriented pairs: 15181
 - SN   pairs with other orientation:   6673578
 - SN   pairs on different chromosomes: 252295

So I guess the reads were no good? What might have caused this to happen to a few files out of the bunch?

ADD REPLYlink modified 4.2 years ago by Devon Ryan98k • written 4.2 years ago by dolores0

I tried Bowtie 2 and got the same bad result.

7266050 (100.00%) were paired; of these: 7259466 (99.91%) aligned concordantly 0 times 3742 (0.05%) aligned concordantly exactly 1 time 2842 (0.04%) aligned concordantly >1 times ---- 7259466 pairs aligned concordantly 0 times; of these: 12170 (0.17%) aligned discordantly 1 time ---- 7247296 pairs aligned 0 times concordantly or discordantly; of these: 14494592 mates make up the pairs; of these: 7368806 (50.84%) aligned 0 times 5886422 (40.61%) aligned exactly 1 time 1239364 (8.55%) aligned >1 times 49.29% overall alignment rate

When I visualized their BigWig files in IGV, they were actually exactly where they should be, all consistent with each other, and gave me the expected pattern of enrichment.

The problem is that when I used MACS2, it complained that "No common chromosome names can be found from treatment and control! Check your input files! MACS will quit..."

samtools idxstats option showed that the headers look identical to the ones in the files that are working.

I guess I'll try aligning them as single end reads and combine later. I'm just very puzzled by the observation that 5 files in a row in a group of 16 samples are giving me this problem. :S

ADD REPLYlink written 4.2 years ago by dolores0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1563 users visited in the last hour