Question

Weird fragments when aligning PE reads to duplicated genes.

0

Entering edit mode

6.8 years ago

bsksln • 0

Hi, I did pair-end ChIP-seq and aligned the reads to Hg19 using Bowtie2. One of the region that I am particularly interested in contains two duplicated genes. They are nearly identical in promoter regions and the distance is about 5kb. I set the parameter to --sensitive so that I can preserve the "double aligned" reads within the duplicated regions. After aligning, I checked the .bam files and found many weird fragments.

Those fragments are ~5kb in length and will be definitely discarded. I checked the sequence and I suspect that they came from inappropriate alignment. Bowtie2 aligned the reads pair separately. While one reads (R1) is mapped to duplicate gene1, its mate (R2) is mapped to duplicate gene2 (both reads could be perfectly mapped to either duplicate region).

One way to solve this is, after aligning R1, try to align R2 near the coordinate of R1. I tried but failed to find the how to do this.

Does anyone know how to do it? Or do you have any other solution to the problem? Thanks.

ChIP-Seq alignment • 1.4k views

ADD COMMENT • link updated 6.8 years ago by colindaven 6.4k • written 6.8 years ago by bsksln • 0

0

Entering edit mode

What is your bowtie2 command-line?

ADD REPLY • link 6.8 years ago by h.mon 35k

0

Entering edit mode

bowtie2 -p 1 --phred33 -x $bt2idx/hg19 -1 $trimdir/"$base"_1.paired.fastq.gz -2 $trimdir/"$base"_2.paired.fastq.gz -S $aligndir/"$base"_aligned_reads.sam

ADD REPLY • link 6.8 years ago by bsksln • 0

score 0 · Answer 1 · 2017-07-20

0

Entering edit mode

6.8 years ago

colindaven 6.4k

Can't you just exclude "improper pairs" with an insert size longer than the expected ( 300bp?) ?

bamtools filter (easy to install via bioconda) should be able to do this.

Also check your insert size distributions statistically for this region after filtering

ADD COMMENT • link 6.8 years ago by colindaven 6.4k

0

Entering edit mode

Colindaven, it's what we did. But it resulted in the waste of 10-30% reads, so I want to find methods to save them.

ADD REPLY • link 6.8 years ago by bsksln • 0