Weird fragments when aligning PE reads to duplicated genes.
1
0
Entering edit mode
6.8 years ago
bsksln • 0

Hi, I did pair-end ChIP-seq and aligned the reads to Hg19 using Bowtie2. One of the region that I am particularly interested in contains two duplicated genes. They are nearly identical in promoter regions and the distance is about 5kb. I set the parameter to --sensitive so that I can preserve the "double aligned" reads within the duplicated regions. After aligning, I checked the .bam files and found many weird fragments.

Those fragments are ~5kb in length and will be definitely discarded. I checked the sequence and I suspect that they came from inappropriate alignment. Bowtie2 aligned the reads pair separately. While one reads (R1) is mapped to duplicate gene1, its mate (R2) is mapped to duplicate gene2 (both reads could be perfectly mapped to either duplicate region).

One way to solve this is, after aligning R1, try to align R2 near the coordinate of R1. I tried but failed to find the how to do this.

Does anyone know how to do it? Or do you have any other solution to the problem? Thanks.

ChIP-Seq alignment • 1.4k views
ADD COMMENT
0
Entering edit mode

What is your bowtie2 command-line?

ADD REPLY
0
Entering edit mode

bowtie2 -p 1 --phred33 -x $bt2idx/hg19 -1 $trimdir/"$base"_1.paired.fastq.gz -2 $trimdir/"$base"_2.paired.fastq.gz -S $aligndir/"$base"_aligned_reads.sam

ADD REPLY
0
Entering edit mode
6.8 years ago

Can't you just exclude "improper pairs" with an insert size longer than the expected ( 300bp?) ?

bamtools filter (easy to install via bioconda) should be able to do this.

Also check your insert size distributions statistically for this region after filtering

ADD COMMENT
0
Entering edit mode

Colindaven, it's what we did. But it resulted in the waste of 10-30% reads, so I want to find methods to save them.

ADD REPLY

Login before adding your answer.

Traffic: 2328 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6