Question: Comparing WGS, WXS, and SNP Array data
0
gravatar for novice
2.8 years ago by
novice920
United States
novice920 wrote:

Hi

I have samples that are processed in three ways: whole genome seqeuncing, whole exome sequencing, and infinium SNP array. I'm looking for suggestions on how I could compare these data to see how much variance there exists simply due to using different technologies. Specifically, I'm interested in copy number analysis. My initial thought is to obtain the log ratios for each and then see the correlation in log ratio between different methods. I can get the log ratio for SNP array data, but I don't know how to do it for WGS or WES. Has anyone done something similar in the past? I also can't seem to find any recent work that has done this kind of work before, so I would appreciate any pointers.

snp wgs • 2.1k views
ADD COMMENTlink modified 2.8 years ago by charco50 • written 2.8 years ago by novice920
1

To look for differences, I'd compare SNPs, indels, etc for base difference, position difference, even the call quality. But for CNVs, I am not sure if the SNP array will cooperate unless your SNP array results are different from what I have seen. In general, don't you usually get a genotype call per locus for each sample with SNP array? That said, I have seen people run PCR/qPCR with fluorescence-labeled SNP tags though to get an idea of copy number. Maybe you have this kind of data.

ADD REPLYlink written 2.8 years ago by berge201580
1
gravatar for charco
2.8 years ago by
charco50
charco50 wrote:

The resolution of SNP arrays WGS and WXS is quite different. Generally WGS and WXS will be able to call more focal copy number changes. It is important to take this into account in your comparison.

There are various software packages for calling copy numbers from sequencing, far too many to list here. I provide some examples of packages I have used.

This works on tumour samples: https://sites.google.com/site/oncosnp/ https://sites.google.com/site/oncosnpseq/

For WXS and WGS, log ratios could be obtained using CopywriteR: https://www.bioconductor.org/packages/devel/bioc/html/CopywriteR.html Integer copy numbers could come from facets: https://github.com/mskcc/facets

ADD COMMENTlink written 2.8 years ago by charco50

I've been working with CopwriteR and it does exactly what I was looking for; thanks. However, it is extremely slow on WGS data. Do you know of a more efficient (probably by being more parallelizable) tool?

ADD REPLYlink written 2.8 years ago by novice920

I couldn't quite tell from you comment - are you using the parallel functionality of CopywriteR?

ADD REPLYlink written 2.7 years ago by charco50

Yes. The problem is that CopywriteR is only parallel in the sense that it can work with multiple samples at the same time.

ADD REPLYlink written 2.7 years ago by novice920
1
gravatar for rkostadi
2.8 years ago by
rkostadi60
rkostadi60 wrote:

The key is to get the break points right. See if the 3 segmented wgs wxs array profiles get the same or different break points for CN events. Segmentation is art. Also, all 3 platforms will give you allelic imbalance information, use it. Evaluate # of events called by 1,2,3, concordance in break point positions, etc. Segmentation methods like to smooth profiles, whereas the genome is not smooth at break points it is a discrete "cut". Signal intensity - log r ratio, and read depth will vary, wxs will be wild due to gc bias, nad capture, wgs will have low read depth, array will probably not have a good dynamic range.

Good luck.

ADD COMMENTlink written 2.8 years ago by rkostadi60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour