Copy Number Analysis on single file
0
0
Entering edit mode
4.7 years ago
user31888 ▴ 130

Hi,

Is there a tool to detect copy number variants on a single target sequenced sample file?

All the tools I know (GATK, ControlFREEC, VarScan) require a mate file.

Thanks !

CNV • 2.7k views
ADD COMMENT
0
Entering edit mode

What have you got? - a single FASTQ file relating to single-end next generation sequencing?

ADD REPLY
0
Entering edit mode

I have a BAM file from which I obtained 2 vcf files after calling germline and somatic variants using GATK.

ADD REPLY
0
Entering edit mode

It seems that bcftools cnv could sort me out. I need to add to my vcf, the B-allele frequency (I can compute it from the AD field), and the LRR (don't know what it is yet),

ADD REPLY
0
Entering edit mode

bcftools cnv does not seem to work since we need to have 2 samples to compute LRR (Log R Ratio).

ADD REPLY
0
Entering edit mode

ControlFREEC can calculate copy number from a single sample, but what is your sample? - whole genome sequencing, exome sequencing, or target sequencing?

ADD REPLY
0
Entering edit mode

It is target sequencing.

ADD REPLY
0
Entering edit mode

ControlFREEC can call copy number for this data. Here is a sample config file that you'll need for this type of data:

[general]
BedGraphOutput=TRUE
bedtools=/Programs/bedtools2/bin/bedtools
breakPointThreshold=0.8
breakPointType=2
chrFiles=FREEC/mm10/chromosomes/
chrLenFile=FREEC/mm10/mm10.fasta.fai
SNPfile=FREEC/mm10/mm10_dbSNP137.ucsc.freec.txt
coefficientOfVariation=0.06
contamination=0
contaminationAdjustment=TRUE
degree=3
forceGCcontentNormalization=1
#gemMappabilityFile=FREEC/mm10/GRCm38_68_mm10.gem
#minMappabilityPerWindow=0.85
intercept=1
minCNAlength=3
minExpectedGC=0.35
maxExpectedGC=0.55
minimalSubclonePresence=0.3
maxThreads=3
numberOfProcesses=2
noisyData=TRUE
outputDir=Out/BM25/FREEC/
ploidy=2
printNA=TRUE
readCountThreshold=10
samtools=/Programs/samtools-1.9/samtools
sex=XY
step=10000
telocentromeric=50000
#uniqueMatch=TRUE
window=50000

[sample]
inputFormat=BAM
mateFile=Out/BM25/BM25_Aligned_Sorted_PCRDuped_FiltMAPQ.bam
mateOrientation=FR
ADD REPLY
0
Entering edit mode

Thanks Kevin ! The program generates the sample.cnp and GC_profile.cnp files, but then stops with the following error:

Error: zero reads in windows with the GC-content around 0.35 with interval 0.01, will try again with 0.04
Error: zero reads in windows with the GC-content around 0.35 with interval 0.04
Unable to proceed
Try to rerun the program with higher number of reads

I tried to decrease the window size to 500 but I got the same error. Note, I am only interested in one chromosome (my BAM contains only reads mapped to one chromosome). I supplied a captureRegions path though. Do you have an idea what parameters should I adjust?

ADD REPLY
0
Entering edit mode

How large is your target region?

ADD REPLY
0
Entering edit mode

It is about 10e6 bp.

ADD REPLY
0
Entering edit mode

You may need to add a new chunk of code to your config file, like this:

[target]
captureRegions=TargetRegion.bed

Also, with that, set window=0 in the [general] chunk.

Be aware of the other option, readCountThreshold=10, which you may need to toggle. 10 is already low, though.

ADD REPLY
0
Entering edit mode

I also decreased the following parameters and still get the same error:

minExpectedGC=0
minimalSubclonePresence=0
readCountThreshold=0
window=1                    #(window=0 does not work)
ADD REPLY
0
Entering edit mode

Alternatively, I need to calculate the approximative copy number (and exon involved) of a specific gene. Would it be possible to do it "manually" from a BAM or VCF file?

ADD REPLY
0
Entering edit mode

Realistically, I am not confident that it is possible to accurately determine copy number for just one gene - how would you know the level of coverage that is reflective of normal, deleted, or amplified DNA, when read coverage, even in a 'normal' piece of DNA, fluctuates a lot based on GC content, sequence similarity elsewhere in the genome, etc? Even when we have genome-wide data, copy number callers disagree a lot.

Why can't you just pick a few exons, design some primers, obtain a normal reference hgDNA, and then do qPCR (determine copy number via the delta delta Ct method).

ADD REPLY
1
Entering edit mode

Good idea. I will give qPCR a try. Thanks !

ADD REPLY

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6