Question

Identify Virus Subpopulations with 454 Sequencing

0

Entering edit mode

9.9 years ago

pld 5.1k

Suppose I have n isolates, some from strain A and some from strain B. After performing deep sequencing and alignment against a reference genome (strain A), I have a list of SNPs for each isolate and the frequency of that SNP in the given isolate.

What would be the best way to determine if an isolate population is just from a single strain, or if it is some mixture of both strains.

Say isolate 1 is supposed to be from strain A, how can I assert that it is composed of just strain A, or if the isolate contains both strains?

SNP deep-sequencing virus 454 • 2.3k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.9 years ago by pld 5.1k

Ram · Answer 1 · 2014-06-16

0

Entering edit mode

9.9 years ago

bulovic.ana ▴ 90

Imagine the following situation:

You have a 454 read, which means a relatively long read. You can first perform a mapping (using for instance bwa-mem) to some reference genome and check for a following situation: If you have a region of the genome where you can find two sets of reads, one with a mutation in position x, and the other with the mutation on the position y, then you can be pretty sure you are dealing with 2 strains.

You can also try to perform the de novo assemblies (with for instance Vicuna) and see if you get different contigs for the same positions (as regards to some reference genome).

I would look for regions which are covered with more than one set of corresponding reads.

Also, I have previously found software that tackles this problem, but am somehow unable to do so now. Will update.

P.S. You can visualize the alignment with the bwa-mem using IGV.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by bulovic.ana ▴ 90

0

Entering edit mode

This is close, but not exactly what I want. There are different genotypes within a single isolate, we have mutations that are less than 100% frequency. However, what is not clear is if a given genotype is a mutation of strain A, or is a genotype of strain B.

The goal is to see if there is contamination. The possibilites are that the isolate is all Strain A (or some mutant of it), all Strain B (or some mutation of it) or that the isolate contains some mixture of strain A and strain B.

Reference sequences for Strain A and Strain B are known.

ADD REPLY • link 9.9 years ago by pld 5.1k

0

Entering edit mode

What you want to say is that you have something that by its mutations is half way between A and B and you are not sure whether it is A mutated of B?

I presume the contamination, if it has occurred, is whole-strain contamination.

What do you mean when you say that the mutations are less than 100% freq? That not all the reads covering this area have them?

Couldn't you align the two strains, see in which positions they differ and then align the reads to one of the strains and see whether the other strain pattern for that genomic position occurs in some percentage that would indicate it is not there by accident?

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by bulovic.ana ▴ 90

0

Entering edit mode

That's what I'm doing at the moment, but I was hoping there would be something already out there.

Not halfway, just that Mutant A != Strain B. Being in the set of strain A mutants, a subset of strain A, is mutually exclusive of being in the set of Strain B.

Yes that is what I mean by <100%. A satistically significant number of reads align to the position that differ in some position from the reference sequence.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by pld 5.1k