Question: Is this contig for real? Getting support from bowtie2 mappings
gravatar for bitjunkie
5.2 years ago by
United States
bitjunkie40 wrote:

Hey guys,

So, I have some contigs constructed from illumina paired-reads (with ABySS) that did not map to our reference genomic sequence, which was supposed to be the only thing in our sample. About half the reads did not map and we sequenced to a high depth. I want to find out which of these contigs are actually real. 

My thought is to map the reads back to the contigs with bowtie2 and determine from the mapping data which are the most supported contigs. I already looked at how many reads mapped to each contig but I realized that didn't tell me enough information. I would like to determine support for a contig based on how many read pairs mapped concordantly and with the correct insert size. How can I do this procedurally? What should the formula look like for generating a quantitative measure of support?

Open to ideas other ideas, too. 


sequencing alignment assembly • 1.2k views
ADD COMMENTlink modified 5.2 years ago by Philipp Bayer6.5k • written 5.2 years ago by bitjunkie40

Usually you can trust assemblers. They won't assemble contigs from nowhere. As Istvan said, searching against nt is a necessary step. A lot of sequences in nt are not put into the reference assembly. Nt also helps to identify microbiome contamination. If you are working on a model organism, also run repeatmasker. At least for humans, these extra contigs tend to be diverged copies of repeats.

ADD REPLYlink written 5.2 years ago by lh331k
gravatar for Istvan Albert
5.2 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Blast some reads/contigs against nt.

Though we once were in a very similar situation and even blasting against nt  did not return any results whatsoever. We are still wondering where the heck have those reads come from. 

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Istvan Albert ♦♦ 81k
gravatar for Philipp Bayer
5.2 years ago by
Philipp Bayer6.5k
Philipp Bayer6.5k wrote:

I concur with blasting against nt -

also, have a look at GC content using Blobology, if you see several distinct clusters then you might have contamination.

ADD COMMENTlink written 5.2 years ago by Philipp Bayer6.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 811 users visited in the last hour