Question: Reads That Are Used In De Novo Assemblies That Did Not Map To Contigs
gravatar for lwc628
7.4 years ago by
United States
lwc628210 wrote:

I did de novo transcriptome assembly using RNAseq reads using an Oases assembler. It has an option to spit out the reads that were unused in the construction of contigs. Using this facility, I divided the raw read (the original fastq file) into used and unused, and used bowtie to map only used reads to the contig. However, mapping was about ~70%, and there were unmapped reads(bowtie has an options to get this)

What can account for the "reads that were used in making of contigs not mapping to the contigs"? Any advice or suggestions are greatly appreciated

assembly bowtie • 2.4k views
ADD COMMENTlink modified 12 days ago by Jeremy Leipzig19k • written 7.4 years ago by lwc628210

See relaxing your bowtie alignment stringency: and building indexes at --offrate 1

Express suggests using bowtie2 with these options : --offrate 1 -a -X 800 --rdg 6,5 --rfg 6,5 --score-min L,-.6,-.4 --no-discordant --no-mixed

ADD REPLYlink modified 7.4 years ago • written 7.4 years ago by Rm8.0k
gravatar for Pavel Senin
7.4 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

IMO, this might happen due to the error correction which assembly does - i.e. choosing a base for particular position by the majority of aligned reads - then reads with that base being different may not align back (also indels); another issue could be that reads are getting truncated within the assembly process based on the quality, but i am not an expert in Oases... You can investigate this issue further by lowering the alignment stringency when you map your reads back to assembled contigs.

ADD COMMENTlink written 7.4 years ago by Pavel Senin1.9k
gravatar for Jeremy Leipzig
12 days ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

Any read that spans two contigs won't map to any one contig, although the kmers that make up that read are still "used".

You might ask "why there are multiple contigs at all when there are reads that could connect them?" The answer is that those reads in fact map to multiple contigs because of repetitive regions. Any ambiguities introduced by repeats will force the assembler to keep those contigs separate. So the reads that don't map are likely those that border repeats.

ADD COMMENTlink modified 12 days ago • written 12 days ago by Jeremy Leipzig19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 865 users visited in the last hour