Question: Identify Virus Subpopulations with 454 Sequencing
0
gravatar for pld
5.5 years ago by
pld4.8k
United States
pld4.8k wrote:

Suppose I have n isolates, some from strain A and some from strain B. After performing deep sequencing and alignment against a reference genome (strain A), I have a list of SNPs for each isolate and the frequency of that SNP in the given isolate.

What would be the best way to determine if an isolate population is just from a single strain, or if it is some mixture of both strains.

 

Say isolate 1 is supposed to be from strain A, how can I assert that it is composed of just strain A, or if the isolate contains both strains?

virus snp deep-sequencing 454 • 1.4k views
ADD COMMENTlink modified 3.0 years ago by Biostar ♦♦ 20 • written 5.5 years ago by pld4.8k
0
gravatar for bulovic.ana
5.5 years ago by
bulovic.ana70
Croatia
bulovic.ana70 wrote:

Imagine the following situation:

You have a 454 read, which means a relatively long read. You can first perform a mapping (using for instance bwa-mem) to some reference genome and check for a following situation:  If you have a region of the genome where you can find two sets of reads, one with a mutation in position x, and the other with the mutation on the position y, then you can be pretty sure you are dealing with 2 strains.

You can also try to perform the de novo assemblies (with for instance Vicuna) and see if you get different contigs for the same positions (as regards to some reference genome).

I would look for regions which are covered with more than one set of corresponding reads.

Also, I have previously found software that tackles this problem, but am somehow unable to do so now. Will update. 

 

P.S. You can visualize the alignment with the bwa-mem using IGV.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by bulovic.ana70

This is close, but not exactly what I want. There are different genotypes within a single isolate, we have mutations that are less than 100% frequency. However, what is not clear is if a given genotype is a mutation of strain A, or is a genotype of strain B.

The goal is to see if there is contamination. The possibilites are that the isolate is all Strain A (or some mutant of it), all Strain B (or some mutation of it) or that the isolate contains some mixture of strain A and strain B.

Reference sequences for Strain A and Strain B are known.

ADD REPLYlink written 5.5 years ago by pld4.8k

What you want to say is that you have something that by its mutations is half way between A and B and you are not sure  whether it is A mutated of B? 

I presume the contamination, if it has occurred, is whole-strain contamination. 

What do you mean when you say that the mutations are less than 100% freq? That not all the reads covering this area have them?

Couldn't you align the two strains, see in which positions they differ and  then align the reads to one of the strains and see whether the other strain pattern for that genomic position occurs in some percentage that would indicate it is not there by accident?

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by bulovic.ana70

That's what I'm doing at the moment, but I was hoping there would be something already out there.

 

Not halfway, just that Mutant A != Strain B. Being in the set of strain A mutants, a subset of strain A, is mutually exclusive of being in the set of Strain B.

 

Yes that is what I mean by <100%. A satistically significant number of reads align to the position that differ in some position from the reference sequence.

ADD REPLYlink written 5.5 years ago by pld4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour