Question: Bowtie Alignment And Low Percentage Of Mapped Reads
4
gravatar for samsara
6.8 years ago by
samsara580
The Earth
samsara580 wrote:

I have mapped 101 bp paired end data from illumina machine against cow genome. I used Bowtie for mapping, but i am so much surprised to see incredibly low mapped reads which was 2% (Edit 1: 3.76%). I am not an exprienced user of bowtie; i did not use extra alignment parameters.

Do i need use extra parameters inorder to get higher percentage of mapped reads? Could there be any other problems?

I used following command

bowtie --chunkmbs 400 -S -p 12 bowtieGenomeIndex_cow -1 R1.fastq -2 R2.fastq

Edit 1: Bowtie output

# reads processed: 784559228
>>>>> # reads with at least one reported alignment: 29469538 (3.76%)
>>>>> # reads that failed to align: 755089690 (96.24%)
>>>>> Reported 29469538 paired-end alignments to 1 output stream(s)

Edit 2: FASTQC quality graphs

Forward Read Quality Image

Reverse Read Quality Image

Edit 3: Alignment with maximum mismatch=3 and insert size=400

# reads processed: 157554639
# reads with at least one reported alignment: 104534049 (66.35%)
# reads that failed to align: 53020590 (33.65%)
Reported 104534049 paired-end alignments to 1 output stream(s)

So, it seems the issue is with the insert size.

Edit 4: Alignment with insert size=650

# reads processed: 157554639
# reads with at least one reported alignment: 132701326 (84.23%)
# reads that failed to align: 24853313 (15.77%)
Reported 132701326 paired-end alignments to 1 output stream(s)

P.S. - data was from cow's genomic DNA.

genome alignment bowtie • 15k views
ADD COMMENTlink modified 4.3 years ago by Leandro de Mattos90 • written 6.8 years ago by samsara580

could you produce a FASTQC report of your fastq files that you used to map to bowtie here?

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Arun2.3k

Looking at fastqc is definitely worthwhile. You also want to be sure the '-X' parameter is set correctly for your paired end sizes. The default in bowtie is stringent. See this question for more discussion: http://www.biostars.org/post/show/9090/bowtie-pair-end-broken/

ADD REPLYlink written 6.8 years ago by Brad Chapman9.4k
2

good point - another test the original author may try is to map one of the files in single end mode and evaluate that

ADD REPLYlink written 6.8 years ago by Istvan Albert ♦♦ 80k

What could be the correct -X and -I value for 101bp read length.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by samsara580
1

The -X setting is determined by the fragment size of your library. If you don't know the expected distribution, you can set it to something larger like '-X 1000' and then look at the distribution of read pairs that are mapped to estimate it. Or use BWA instead of bowtie and it will infer the size for you.

ADD REPLYlink written 6.8 years ago by Brad Chapman9.4k

is there dark side of setting larger insert size ? How can i calculate insert size if read length is 101bp and fragment size is 400-600bp ?

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by samsara580
1

The main downside is speed: it'll be slower with larger insert sizes since there is a larger search space. For your sizes you'd want to set -X 600 or add some padding to that with -X 700.

ADD REPLYlink written 6.8 years ago by Brad Chapman9.4k

Thanks a lot. I used -X 650 and i already got better result. I got about 85% of the reads mapped.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by samsara580
5
gravatar for Istvan Albert
6.8 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

Most likely your sequencing run has failed in its entirety - either library preparation or the during the sequencing process.

Look at the duplication rates (fastx) and fastqc reports.

Or perhaps the samples have been mixed up and you are aligning against the wrong genome, though even in that case one usually ends up with more than 2% mapped reads.

ADD COMMENTlink written 6.8 years ago by Istvan Albert ♦♦ 80k

dint see the fastqc mention in your post.

ADD REPLYlink written 6.8 years ago by Arun2.3k

The genome i used is not wrong. I made BLAST of one of the sequences from fastq files, and got 99% identity across Bos taurus genome. Moreover, illumina machine reported mean quality score as 36.86

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by samsara580
4
gravatar for swbarnes2
6.8 years ago by
swbarnes25.2k
United States
swbarnes25.2k wrote:

1) Eyeball some of your fastq. Do you see reads with good quality scores, or not?

2) If you see reads with good quality scores, BLAST some of them, see if BLAST can tell you what they are.

3) Find out what the adaptor sequences for your prep are, maybe you have all adaptor.

Edit:

It's annoying, but try either realigning with bowtie using a number of different possible insert sizes, or try bwa, which doesn't require you to state up front what you are expecting your insert size to be. If you reads are fine quality, and they are the right species, maybe Bowtie is throwing them out because you misinformed it as to what the true insert size is.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by swbarnes25.2k
2
gravatar for Rm
6.8 years ago by
Rm7.8k
Danville, PA
Rm7.8k wrote:

We had similar issue with the human tumor-normal samples : Apart from the following steps as suggested above; we also checked using down sampling fastqs.

Down sample a "million reads" and align using few Aligners (default parameter) (Bowtie; BWA; novoalign; Mosaik; gmap; BLAT) to check to see if issue is with the data or with the aligner parameters etc. ?

In our case: all the aligners had bad mapping % ; so when explored further, we could trace it to issue at the library preparation level.

Also check with the vendor if other users has issues with that "batch" of KITS used etc...

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Rm7.8k
0
gravatar for Leandro de Mattos
4.3 years ago by
Brazil
Leandro de Mattos90 wrote:

Hi, Please,

I have mapped 100 bp paired end data from illumina machine. I used Tophat for mapping, but I have obtained low mapped reads which was 5%.  Is the any parameter in tophat to get higher percentage of mapped reads? Could there be any other problem too?

 

Mattos.

ADD COMMENTlink written 4.3 years ago by Leandro de Mattos90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 933 users visited in the last hour