Question

Cnv Analysis After Segmentations

0

Entering edit mode

12.8 years ago

Irsan ★ 7.8k

Hi there evryone,

We have tumor-normal paired WG-seq data and have successfully estimated copy numbers and segmented them which basically leaves us with a list of coordinates. Now my question is what are best practices after segmentation? We are planning to annotate the segments by downloading transcripts located at the CNVs with the GenomicFeatures package from Bioc and then do a gene set enrichment analysis. Should I be doing some extra quality filtering like filtering my segments for known CNVs, low mappability regions, ...

Anyone wants to share code or best practices?

cnv • 3.9k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 12.8 years ago by Irsan ★ 7.8k

0

Entering edit mode

How many samples? Whole genome? What depth?

ADD REPLY • link 12.8 years ago by Sean Davis 27k

0

Entering edit mode

Only 1 tumor-normal pair (pilot study) whole genome, 40x, 100bp paired en reads (insert size 100 bp)

ADD REPLY • link 12.8 years ago by Irsan ★ 7.8k

0

Entering edit mode

Hi Irsan,

Could you please provide a sample (working tutorial) on how you can get the final list of amp/del genes in n number of T/N samples. I used varscan2 to analyze copynumber call and used CBS. I am stuck here at the moment.

Thanks !

ADD REPLY • link 9.9 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

Ask this as a new question or do a google/biostar search

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by Irsan ★ 7.8k

score 0 · Answer 1 · 2012-10-01

I'd suggest taking a look at a paper like Characterizing complex structural variation in germline and somatic genomes. It sounds like you have done a basic read count based analysis. You may want to look at PEM methods as well as split-read methods to be complete. Since you have a single cancer tumor/normal, you'll want to identify genes that appear to be activated or inactivated by somatic SNVs and small indels and structural rearrangements, not just those with copy number changes. I'd suggest a focus on those techniques since they are likely to be more actionable in the setting of a single sample. As you increase your sample size, you can then apply techniques such as GISTIC to define better regions of shared copy number states. Note GSEA on copy number data from a single sample is not likely to be fruitful since the vast majority of genes in such copy number variable regions are likely passengers and perhaps only half of them are even expressed.