Question: Copy Number Analysis on single file
0
gravatar for user31888
4 weeks ago by
user3188870
United States
user3188870 wrote:

Hi,

Is there a tool to detect copy number variants on a single target sequenced sample file?

All the tools I know (GATK, ControlFREEC, VarScan) require a mate file.

Thanks !

cnv • 233 views
ADD COMMENTlink modified 4 weeks ago by Kevin Blighe48k • written 4 weeks ago by user3188870

What have you got? - a single FASTQ file relating to single-end next generation sequencing?

ADD REPLYlink written 4 weeks ago by Kevin Blighe48k

I have a BAM file from which I obtained 2 vcf files after calling germline and somatic variants using GATK.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by user3188870

It seems that bcftools cnv could sort me out. I need to add to my vcf, the B-allele frequency (I can compute it from the AD field), and the LRR (don't know what it is yet),

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by user3188870

bcftools cnv does not seem to work since we need to have 2 samples to compute LRR (Log R Ratio).

ADD REPLYlink written 4 weeks ago by user3188870

ControlFREEC can calculate copy number from a single sample, but what is your sample? - whole genome sequencing, exome sequencing, or target sequencing?

ADD REPLYlink written 4 weeks ago by Kevin Blighe48k

It is target sequencing.

ADD REPLYlink written 4 weeks ago by user3188870

ControlFREEC can call copy number for this data. Here is a sample config file that you'll need for this type of data:

[general]
BedGraphOutput=TRUE
bedtools=/Programs/bedtools2/bin/bedtools
breakPointThreshold=0.8
breakPointType=2
chrFiles=FREEC/mm10/chromosomes/
chrLenFile=FREEC/mm10/mm10.fasta.fai
SNPfile=FREEC/mm10/mm10_dbSNP137.ucsc.freec.txt
coefficientOfVariation=0.06
contamination=0
contaminationAdjustment=TRUE
degree=3
forceGCcontentNormalization=1
#gemMappabilityFile=FREEC/mm10/GRCm38_68_mm10.gem
#minMappabilityPerWindow=0.85
intercept=1
minCNAlength=3
minExpectedGC=0.35
maxExpectedGC=0.55
minimalSubclonePresence=0.3
maxThreads=3
numberOfProcesses=2
noisyData=TRUE
outputDir=Out/BM25/FREEC/
ploidy=2
printNA=TRUE
readCountThreshold=10
samtools=/Programs/samtools-1.9/samtools
sex=XY
step=10000
telocentromeric=50000
#uniqueMatch=TRUE
window=50000

[sample]
inputFormat=BAM
mateFile=Out/BM25/BM25_Aligned_Sorted_PCRDuped_FiltMAPQ.bam
mateOrientation=FR
ADD REPLYlink written 4 weeks ago by Kevin Blighe48k

Thanks Kevin ! The program generates the sample.cnp and GC_profile.cnp files, but then stops with the following error:

Error: zero reads in windows with the GC-content around 0.35 with interval 0.01, will try again with 0.04
Error: zero reads in windows with the GC-content around 0.35 with interval 0.04
Unable to proceed
Try to rerun the program with higher number of reads

I tried to decrease the window size to 500 but I got the same error. Note, I am only interested in one chromosome (my BAM contains only reads mapped to one chromosome). I supplied a captureRegions path though. Do you have an idea what parameters should I adjust?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by user3188870

How large is your target region?

ADD REPLYlink written 4 weeks ago by Kevin Blighe48k

It is about 10e6 bp.

ADD REPLYlink written 4 weeks ago by user3188870

You may need to add a new chunk of code to your config file, like this:

[target]
captureRegions=TargetRegion.bed

Also, with that, set window=0 in the [general] chunk.

Be aware of the other option, readCountThreshold=10, which you may need to toggle. 10 is already low, though.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe48k

I also decreased the following parameters and still get the same error:

minExpectedGC=0
minimalSubclonePresence=0
readCountThreshold=0
window=1                    #(window=0 does not work)
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by user3188870

Alternatively, I need to calculate the approximative copy number (and exon involved) of a specific gene. Would it be possible to do it "manually" from a BAM or VCF file?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by user3188870

Realistically, I am not confident that it is possible to accurately determine copy number for just one gene - how would you know the level of coverage that is reflective of normal, deleted, or amplified DNA, when read coverage, even in a 'normal' piece of DNA, fluctuates a lot based on GC content, sequence similarity elsewhere in the genome, etc? Even when we have genome-wide data, copy number callers disagree a lot.

Why can't you just pick a few exons, design some primers, obtain a normal reference hgDNA, and then do qPCR (determine copy number via the delta delta Ct method).

ADD REPLYlink written 4 weeks ago by Kevin Blighe48k
1

Good idea. I will give qPCR a try. Thanks !

ADD REPLYlink written 4 weeks ago by user3188870
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 689 users visited in the last hour