4.5 years ago by
I want to draw the plot of Copy number variation using read depth (copy number) in R package...I can find easily the plot using frequency or log-ratio, but it is hard to find the plot using copy number.
You can't have both. What is your data? Microarrays or sequencing?
- Microarrays use logR Ratio and B Allele Frequency to characterize copy number variation
- Next generation sequencing (preferably whole genome) uses read depth, discordant paired ends, chimeric mappings, and other metrics to characterize CNVs
Read depth =/= copy number unless you normalize the values correctly.
You can infer the copy number of your microarrays as follows: I'm assuming you have microarrays because you have logR Ratio data
- Use machine learning: You need a validated call set that was determined using the same microarray platform; then you train a model saying which calls are HOM, HET, or REF (or 3 and 4 copies in DUPs, train DELs and DUPs separately)
- If you don't have a validated set, if your samples are large enough you can use DELs on male chrX as HOM and female chrX as HET. There's some issues with this method, but it works.
- Clustering like kMeans: You'll form clusters of HOM, HET, REF
You'll need to plot the median logR Ratio for each CNV called in an individual. I also use the median chr log R Ratio as another dimension. Consider a probe length requirement of 3 to 5 probes minimum.
If you have any questions, I'll be glad to help. CNVs are my thing