Question: Quantitating variants from MiSeq paired end reads
Hi all, I have been struggling with finding a tool to use to quantitate the absolute numbers of variants within one sample. I have performed paired end sequencing by Illumina MiSeq on samples starting from viral RNA. I have joined the reads, using either PANDAseq or the USEARCH package, and then quality filtered. This is where I am stuck.

  1. I can align the reads to a reference genome, which doesn't capture all the diversity (all the reads do not align to a single reference). Should I proceed with adding new reference genomes, or can I find a way to compare sequences without aligning?

  2. What I would like to do is quantitate the different viral genotypes. Unfortunately, when I try to collapse the reads into groups (genotypes) I am left with the majority of my reads being singlets (Which I assume is due to PCR error). I would like to bin the singlets into the majority groups that they are most closely related to. I think UNOISE3 does this, but it does not give me quantities after running it through the pipeline. Is there a software package that would be able to cluster my singlets into their majority groups to allow me to quantitate viral variants?

