Question: SNP tools to infer copy number variations (CNVs)
1
gravatar for ceruleanivy
6 months ago by
ceruleanivy30
ceruleanivy30 wrote:

I have tried nearly every method regarding the extrapolation of different CNVs by utilizing data from coverage and reads on NGS settings, but unfortunately end up with unreliable results (too much variance among log2s to be considered trustworthy). I would like to know what tool is best for methods using variant allele frequency from SNPs.

sequencing next-gen R • 222 views
ADD COMMENTlink modified 6 months ago by Vincent Laufer880 • written 6 months ago by ceruleanivy30
1

Hey Ceru,

I will provide an answer, but I need more information first.

1) Is this NGS sequencing data, or DNA microarray (chip) data. If so, what kind (e.g., whole genome sequencing, exome only, or what; and what company; or if chip data, what's the chip). 2) Do you have other genetic data on these people? 3) You mention read depth and minor allele frequency only. But you do not mention other things. Depending on 1), there are a lot of other (in some cases, more reliable) things you can use to call CNV. will elaborate more once I hear about 1) and 2).

ADD REPLYlink written 6 months ago by Vincent Laufer880

Thanks for the response. I have targeted ampliseq NGS data on DNA from FFPE samples on ~400 amplicons. I don't have any other genetic data on these patients. The panel was designed with respect to regions that contain many informative SNPs for inferring zygosity.

ADD REPLYlink written 6 months ago by ceruleanivy30
0
gravatar for Vincent Laufer
6 months ago by
United States
Vincent Laufer880 wrote:

Alright, I'll try to break my answer into segments and hit everything.

1) Read Depth - You mention you've tried a lot of different techniques for Read Depth. Most good programs will include features for things like GC content, nucleotide variability, flanking regions, presence of repeats, etc. However, in case for some reason these things were not controlled for, I sort of need to mention this. After controlling for these; many softwares/ companies deliver a metric called various things, for instance "average normalized coverage" instead of raw read depth. This might be a more appropriate quantity to test than raw read depth. Finally, many parts of the genome simply are quite variable, and should always result in high variability with respect to read depth. If your amplicons lay in such areas, other methods of calling CNV may be difficult to interpret as well.

2) You mention allele frequency, but you do not mention paired end mapping based methods, split read based methods, or assembly based methods. I take the point that since you are using targeted amplicons, some of these may not be appropriate choices; but they would still help for certain calls. For instance, for a deletion event inside one of your amplicons, PEM or SRM might still be your best choices. Here is a review that discusses several of these https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S11-S1 in case its helpful. Its a bit old and newer ones are definitely available, but it should be an OK place to start.

In addition, this manuscript is more recent and gives a good survey of tools available for targeted sequencing studies http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004873

3) Now that I have sufficiently delayed answering your original question, I'll cut to the chase. Several of the tools mentioned in the last manuscript, such as CNVkit, will separately call variant allele frequency and attempt to indicate things like regions of lost heterozygosity. However, I would advise that this is a very technically complicated process. Called variants are, ultimately, a kind of probability estimate that results from interpreting information generated by the sequencer. The default scenario is that a variant with strong evidence for A and G will be called A, G. But if you relax that assumption to allow for any scenario (e.g., CN=3 and allowable values are A,A,G and A,G,G) etc., then the most probable answers generated will differ. Since I do not know how you originally generated the VCF that you have, I cannot really speak to how much analysis it would advisable to redo, if any, before attempting to call CNVs.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Vincent Laufer880
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 868 users visited in the last hour