Question: Problem mapping paired-end Illumina reads
0
gravatar for biostart
21 months ago by
biostart350
Germany
biostart350 wrote:

Hello, Could you please advise on the following:

We have ChIP-seq data with paired-end Illumina reads. For some of the samples only about 11% or reads could be mapped with Bowtie. When remapping these samples with Bowtie2, up to 85% reads could be mapped, but the pairs have been lost, meaning that for most mapped reads there is no pair available. What could go wrong and how to fix it? Thanks!

chip-seq alignment paired-end • 653 views
ADD COMMENTlink modified 21 months ago by swbarnes28.9k • written 21 months ago by biostart350

Try BWA with just one file, over even a subset of your reads. BWA will estimate insert size from the mapping, and its output may help you understand what went wrong with your Bowtie mapping. If you want to stick with Bowtie / Bowtie2, you then may use BWA estimated mean and sd insert size values as input for Bowtie.

ADD REPLYlink written 21 months ago by h.mon31k

I can estimate the average DNA fragment length as ~150 based on the reads which successfully aligned

ADD REPLYlink written 21 months ago by biostart350
0
gravatar for swbarnes2
21 months ago by
swbarnes28.9k
United States
swbarnes28.9k wrote:

In my limited experience with bowtie, the default settings require a very stringent ranges of acceptable insert sizes. Try changing these in your command line to be more generous.

ADD COMMENTlink written 21 months ago by swbarnes28.9k

I tried changing to "-X 1000", but it did not help

ADD REPLYlink written 21 months ago by biostart350

The other possibility is that the reads from your fastqs are out of sync. Are they the exact same number of lines?

ADD REPLYlink written 21 months ago by swbarnes28.9k

The numbers of reads is the same, but their quality seems to be different: I have mapped with Bowtie separately each of the two paired fastq files: for one file I've got 50% reads with at least one reported alignment, whereas for the second fastq file I've got 31% reads with at least one reported alignment. I guess this explains how I end up with even smaller percent of aligned pairs when using paired-end alignment. This is then unrelated to the insert size... But how to fix this is the question

ADD REPLYlink modified 21 months ago • written 21 months ago by biostart350

Have a look at the quality of read 1 and read2 with FASTQC or similar (fastp ). Do the quality values of the second read drop off markedly along the read ? Try trimming ? Or as suggested above BWA. Bad R2 is pretty common, especially on >150bp reads from some illumina sequencers.

ADD REPLYlink written 21 months ago by colindaven2.4k

FastQC reports for both read 1 and 2 for the problematic samples look similarly problematic, I am posting the images below. Any idea how to correct this?

FastQC quality score

ADD REPLYlink modified 21 months ago • written 21 months ago by biostart350

per base sequence content

ADD REPLYlink modified 21 months ago • written 21 months ago by biostart350

GC content

ADD REPLYlink written 21 months ago by biostart350

kmer content: k-mers

ADD REPLYlink written 21 months ago by biostart350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1308 users visited in the last hour