I have mapped 101 bp paired end data from illumina machine against cow genome. I used Bowtie for mapping, but i am so much surprised to see incredibly low mapped reads which was 2% (Edit 1: 3.76%). I am not an exprienced user of bowtie; i did not use extra alignment parameters.
Do i need use extra parameters inorder to get higher percentage of mapped reads? Could there be any other problems?
I used following command
bowtie --chunkmbs 400 -S -p 12 bowtieGenomeIndex_cow -1 R1.fastq -2 R2.fastq
Edit 1: Bowtie output
# reads processed: 784559228 >>>>> # reads with at least one reported alignment: 29469538 (3.76%) >>>>> # reads that failed to align: 755089690 (96.24%) >>>>> Reported 29469538 paired-end alignments to 1 output stream(s)
Edit 2: FASTQC quality graphs
Edit 3: Alignment with maximum mismatch=3 and insert size=400
# reads processed: 157554639 # reads with at least one reported alignment: 104534049 (66.35%) # reads that failed to align: 53020590 (33.65%) Reported 104534049 paired-end alignments to 1 output stream(s)
So, it seems the issue is with the insert size.
Edit 4: Alignment with insert size=650
# reads processed: 157554639 # reads with at least one reported alignment: 132701326 (84.23%) # reads that failed to align: 24853313 (15.77%) Reported 132701326 paired-end alignments to 1 output stream(s)
P.S. - data was from cow's genomic DNA.
could you produce a FASTQC report of your fastq files that you used to map to bowtie here?
Looking at fastqc is definitely worthwhile. You also want to be sure the '-X' parameter is set correctly for your paired end sizes. The default in bowtie is stringent. See this question for more discussion: http://www.biostars.org/post/show/9090/bowtie-pair-end-broken/
good point - another test the original author may try is to map one of the files in single end mode and evaluate that
What could be the correct -X and -I value for 101bp read length.
The -X setting is determined by the fragment size of your library. If you don't know the expected distribution, you can set it to something larger like '-X 1000' and then look at the distribution of read pairs that are mapped to estimate it. Or use BWA instead of bowtie and it will infer the size for you.
is there dark side of setting larger insert size ? How can i calculate insert size if read length is 101bp and fragment size is 400-600bp ?
The main downside is speed: it'll be slower with larger insert sizes since there is a larger search space. For your sizes you'd want to set -X 600 or add some padding to that with -X 700.
Thanks a lot. I used -X 650 and i already got better result. I got about 85% of the reads mapped.