Weird fragments when aligning PE reads to duplicated genes.
1
0
Entering edit mode
3.9 years ago
bsksln • 0

Hi, I did pair-end ChIP-seq and aligned the reads to Hg19 using Bowtie2. One of the region that I am particularly interested in contains two duplicated genes. They are nearly identical in promoter regions and the distance is about 5kb. I set the parameter to --sensitive so that I can preserve the "double aligned" reads within the duplicated regions. After aligning, I checked the .bam files and found many weird fragments.

Those fragments are ~5kb in length and will be definitely discarded. I checked the sequence and I suspect that they came from inappropriate alignment. Bowtie2 aligned the reads pair separately. While one reads (R1) is mapped to duplicate gene1, its mate (R2) is mapped to duplicate gene2 (both reads could be perfectly mapped to either duplicate region).

One way to solve this is, after aligning R1, try to align R2 near the coordinate of R1. I tried but failed to find the how to do this.

Does anyone know how to do it? Or do you have any other solution to the problem? Thanks.

ChIP-Seq alignment • 970 views
0
Entering edit mode

0
Entering edit mode

bowtie2 -p 1 --phred33 -x $bt2idx/hg19 -1$trimdir/"$base"_1.paired.fastq.gz -2$trimdir/"$base"_2.paired.fastq.gz -S$aligndir/"\$base"_aligned_reads.sam

0
Entering edit mode
3.9 years ago
colindaven ★ 2.9k

Can't you just exclude "improper pairs" with an insert size longer than the expected ( 300bp?) ?

bamtools filter (easy to install via bioconda) should be able to do this.

Also check your insert size distributions statistically for this region after filtering

0
Entering edit mode

Colindaven, it's what we did. But it resulted in the waste of 10-30% reads, so I want to find methods to save them.