Problem mapping paired-end Illumina reads
1
0
Entering edit mode
5.3 years ago
biostart ▴ 370

Hello, Could you please advise on the following:

We have ChIP-seq data with paired-end Illumina reads. For some of the samples only about 11% or reads could be mapped with Bowtie. When remapping these samples with Bowtie2, up to 85% reads could be mapped, but the pairs have been lost, meaning that for most mapped reads there is no pair available. What could go wrong and how to fix it? Thanks!

ChIP-Seq alignment paired-end • 2.0k views
ADD COMMENT
0
Entering edit mode

Try BWA with just one file, over even a subset of your reads. BWA will estimate insert size from the mapping, and its output may help you understand what went wrong with your Bowtie mapping. If you want to stick with Bowtie / Bowtie2, you then may use BWA estimated mean and sd insert size values as input for Bowtie.

ADD REPLY
0
Entering edit mode

I can estimate the average DNA fragment length as ~150 based on the reads which successfully aligned

ADD REPLY
0
Entering edit mode
5.3 years ago

In my limited experience with bowtie, the default settings require a very stringent ranges of acceptable insert sizes. Try changing these in your command line to be more generous.

ADD COMMENT
0
Entering edit mode

I tried changing to "-X 1000", but it did not help

ADD REPLY
0
Entering edit mode

The other possibility is that the reads from your fastqs are out of sync. Are they the exact same number of lines?

ADD REPLY
0
Entering edit mode

The numbers of reads is the same, but their quality seems to be different: I have mapped with Bowtie separately each of the two paired fastq files: for one file I've got 50% reads with at least one reported alignment, whereas for the second fastq file I've got 31% reads with at least one reported alignment. I guess this explains how I end up with even smaller percent of aligned pairs when using paired-end alignment. This is then unrelated to the insert size... But how to fix this is the question

ADD REPLY
0
Entering edit mode

Have a look at the quality of read 1 and read2 with FASTQC or similar (fastp ). Do the quality values of the second read drop off markedly along the read ? Try trimming ? Or as suggested above BWA. Bad R2 is pretty common, especially on >150bp reads from some illumina sequencers.

ADD REPLY
0
Entering edit mode

FastQC reports for both read 1 and 2 for the problematic samples look similarly problematic, I am posting the images below. Any idea how to correct this?

FastQC quality score

ADD REPLY
0
Entering edit mode

per base sequence content

ADD REPLY
0
Entering edit mode

GC content

ADD REPLY
0
Entering edit mode

kmer content: k-mers

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6