Hi everyone, I have some questions about Copy Number Variation (CNV) analysis. I am currently processing samples that include both tumor and adjacent normal tissues, as well as tumor samples without adjacent normal tissues.I dont have complete normal (disease free sample) For my analysis, I need to run CNV assessments for both somatic and germline variations. I have been using VarScan2 and have generated segment files by following these steps:
1. Generate mpileup file:
samtools mpileup -B -q 1 -f ref.fa normal.bam tumor.bam > normal_tumor.mpileup
2. Run VarScan to call copy numbers:
java -jar varscan-2.4.6/VarScan.v2.4.6.jar copynumber normal_tumor.mpileup normal_tumor.basename --min-coverage 20 --min-segment-size 100 --max-segment-size 1000 --p-value 0.001 --mpileup 1
3. Call copy number variations:
java -jar varscan-2.4.6/VarScan.v2.4.6.jar copyCaller normal_tumor.basename.copynumber --output-file normal_tumor.copynumber.called --output-homdel-file normal_tumor.copynumber.homdel
4. Segmentation and Classification:
library(DNAcopy)
cn <- read.table("normal_tumor.copynumber.called", header=F)
CNA.object <- CNA(genomdat = cn[,6], chrom = cn[,1], maploc = cn[,2], data.type = 'logratio')
CNA.smoothed <- smooth.CNA(CNA.object)
segs <- segment(CNA.smoothed, verbose=0, min.width=2)
segs2 = segs$output
write.table(segs2[,2:6], file="out.file", row.names=F, col.names=F, quote=F, sep="\t")
I have successfully generated the segment file (out.file) for each somatic sample. Questions:
-How can I combine all the segment files into a single file that contains information for all samples? Is there a specific method for merging these files similar to how VCF files are combined in SNP calling?
-Once combined, what is the best approach for annotating this data? Is combining necessary, or should I keep the files separate?
Thank you for your assistance!