Question: CNVkit runtime for WGS?
0
gravatar for wjar6718
2.8 years ago by
wjar67180
Australia
wjar67180 wrote:

Hello,

I am a new cnvkit user. It is necessary that I have to use cnvkit with my WGS of a T-N pair (45x and 29x coverages). I am running them with 4 processes (each for 16GB). Now the program is still running for 2 days and I think it is is the fix step (for a whole day now) because I can see my reference.cnn. Is this case usual for you guys?

My command:

./python2.7 cnvkit.py batch Tumor.recal_sort2_dedup2.realigned2.NTrealign.bam --normal Normal_sort_dedup.realigned.recalibrated.NTrealign.bam --rlibpath /cnvkit/R-3.2.3/lib64/R/library:/cnvkit/R-3.2.3/lib64:/cnvkit/R-3.2.3/lib64/R/lib -t data/access-5k-mappable.hg19.bed --fasta hg19_chromosome.fa -g data/access-5k-mappable.hg19.bed --split --annotate data/refFlat.txt -p 4 --output-reference reference.cnn -y​

My BAM header:

@HD    VN:1.4    GO:none    SO:coordinate
@SQ    SN:chr1    LN:249250621
@SQ    SN:chr2    LN:243199373
@SQ    SN:chr3    LN:198022430
@SQ    SN:chr4    LN:191154276
@SQ    SN:chr5    LN:180915260
@SQ    SN:chr6    LN:171115067
@SQ    SN:chr7    LN:159138663
@SQ    SN:chr8    LN:146364022
@SQ    SN:chr9    LN:141213431
@SQ    SN:chr10    LN:135534747
@SQ    SN:chr11    LN:135006516
@SQ    SN:chr12    LN:133851895
@SQ    SN:chr13    LN:115169878
@SQ    SN:chr14    LN:107349540
@SQ    SN:chr15    LN:102531392
@SQ    SN:chr16    LN:90354753
@SQ    SN:chr17    LN:81195210
@SQ    SN:chr18    LN:78077248
@SQ    SN:chr19    LN:59128983
@SQ    SN:chr20    LN:63025520
@SQ    SN:chr21    LN:48129895
@SQ    SN:chr22    LN:51304566
@SQ    SN:chrX    LN:155270560
@SQ    SN:chrY    LN:59373566
@RG    ID:Clean3_L7_fix_kmer_q15_TrimN_N0_L70    PU:None    LB:1    SM:T    CN:hcpcg    PL:ILLUMINA
@RG    ID:Clean3_L8_fix_kmer_q15_TrimN_N0_L70    PU:None    LB:1    SM:T    CN:hcpcg    PL:ILLUMINA
@PG    ID:GATK PrintReads    VN:3.4-46-gbc02625    CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false
@PG    ID:MarkDuplicatesCheers,.....                                                                                                                       @PG    ID:bwa.6    VN:0.7.12-r1039    CL:./bwa mem -t 4 hg19_chromosome.fa R1.fastq.gz R2.fastq.gz -M -R @RG\tID:Clean3_L8_fix_kmer_q15_TrimN_N0_L70\tPL:ILLUMINA\tPU:None\tLB:1\tSM:T\tCN:hcpcg

Cheers,

James

cnvkit • 1.6k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by wjar67180
0
gravatar for Eric T.
2.8 years ago by
Eric T.2.3k
San Francisco, CA
Eric T.2.3k wrote:

CNVkit version 0.7.5, released on Saturday, has a much faster implementation of the "fix" command that should be more appropriate for WGS.

If you are using CNVkit v0.7.4 or earlier, and you have generated the .cnn files for each sample and the reference, then it's OK to kill the "batch" job now -- most of the work is done. Then update CNVkit to the latest version, and run the "fix" and "segment" commands manually for your two samples.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Eric T.2.3k
0
gravatar for wjar6718
2.8 years ago by
wjar67180
Australia
wjar67180 wrote:

Thx Etal, I try it out for your easy-to-use package and will tell you.

Cheers,

James

ADD COMMENTlink written 2.8 years ago by wjar67180
0
gravatar for wjar6718
2.8 years ago by
wjar67180
Australia
wjar67180 wrote:

Hi Etal,

With the WGS data, I have seen the following error when using cnvkit.py fix:

"  File "/cnvkit/cnvlib/commands.py", line 574, in _cmd_fix
    % (tgt_raw.sample_id, anti_raw.sample_id))
ValueError: Sample IDs do not match:'Clean3_mergedL7L8_150911_FR07887821_targetcoverage' (target) vs. 'Clean3_mergedL7L8_150911_FR07887821_antitargetcoverage' (antitarget) "

#cnvkit version

./python2.7 cnvkit.py version
0.7.5

#My Clean3_mergedL7L8_150911_FR07887821_antitargetcoverage is empty following the online maual:

$head Clean3_mergedL7L8_150911_FR07887821_antitargetcoverage.cnn
chromosome    start    end    gene    log2

#My Clean3_mergedL7L8_150911_FR07887821_targetcoverage:

$ head Clean3_mergedL7L8_150911_FR07887821_targetcoverage.cnn

chromosome    start    end    gene    log2

chr1    10000    10266    DDX11L1    9.23584

chr1    10266    10533    DDX11L1    8.62594

chr1    10533    10799    DDX11L1    4.1049

chr1    10799    11066    DDX11L1    4.61658

chr1    11066    11332    DDX11L1    6.19198

chr1    11332    11599    DDX11L1    6.45691

chr1    11599    11866    DDX11L1    7.39473

chr1    11866    12132    DDX11L1    7.32873

chr1    12132    12399    DDX11L1    7.32655

#My command:

./python2.7 cnvkit.py fix Clean3_mergedL7L8_150911_FR07887821_targetcoverage.cnn Clean3_mergedL7L8_150911_FR07887821_antitargetcoverage.cnn Clean3_L8_FR07887830_reference.cnn -o Clean3_mergedL7L8_150911_FR07887821.cnr

#My error file

ValueError: Sample IDs do not match:'Clean3_mergedL7L8_150911_FR07887821_targetcoverage' (target) vs. 'Clean3_mergedL7L8_150911_FR07887821_antitargetcoverage' (antitarget)

I searched Google to find the way out, but unfortunately it is beyond my knowledge. Could you help me sort this out? Thank you for your precious time.

PS- Clean3_mergedL7L8_150911_FR07887821==Tumor.bam and Clean3_L8_FR07887830==Normal.bam

Cheers, james 

 

ADD COMMENTlink written 2.8 years ago by wjar67180

Hi James,

The problem is that the *_antitargetcoverage.cnn and *_targetcoverage.cnn files need to be named *.antitargetcoverage.cnn and *.targetcoverage.cnn instead, i.e. "Clean3_mergedL7L8_150911_FR07887821.targetcoverage.cnn" and "Clean3_mergedL7L8_150911_FR07887821.antitargetcoverage.cnn".

The sample ID is the filename leading up to the first "." character, so there needs to be a "." between "Clean3_mergedL7L8_150911_FR07887821" and "targetcoverage.cnn" or "antitargetcoverage.cnn" for CNVkit to recognize that the samples match. (It's hard-coded this way, sorry.)

 

ADD REPLYlink written 2.8 years ago by Eric T.2.3k
0
gravatar for wjar6718
2.8 years ago by
wjar67180
Australia
wjar67180 wrote:

I fixed the name, ran 'cnvkit.py fix' and got the following call:

"Correcting for GC bias...
Correcting for density bias...
Weighting bins by relative coverage depths in reference
Weighting bins by coverage spread in reference
Processing antitarget: Clean3_mergedL7L8_150911_FR07887821
Traceback (most recent call last):
  File "cnvkit.py", line 11, in <module>
    args.func(args)
  File "/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/cnvkit/cnvlib/commands.py", line 576, in _cmd_fix
    args.do_gc, args.do_edge, args.do_rmask)
  File "/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/cnvkit/cnvlib/commands.py", line 592, in do_fix
    anti_iqr = metrics.interquartile_range(anti_cnarr.residuals())
  File "/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/cnvkit/cnvlib/cnary.py", line 262, in residuals
    return np.concatenate(resids)
ValueError: need at least one array to concatenate"

I read your paper that target regions have low repeats, so I filtered out simple repeats (ucsc hg19) from the access-5k-mappable.hg19.bed and used these filtered access regions as a target to generate non-zero antitarget regions. Then I started over using 'cnvkit.py batch' and the program finished successfully!!!

Can I use these results? Is this method still consistent with your methods in the paper?

Thank you very much for your time. Hope I could use these results because they seem okay.

Cheers,

James

 

 

 

ADD COMMENTlink written 2.8 years ago by wjar67180
1

Sorry, I found the issue and fixed it just now.

For your workaround - good idea! I think these results will still be valid. If you mapped the reads with BWA, that should handle ambiguous read alignments acceptably. In any case CNVkit will downweight or filter out low-quality copy number bins at several steps, and CBS should still do fine if the errors are random.

ADD REPLYlink written 2.8 years ago by Eric T.2.3k
0
gravatar for wjar6718
2.8 years ago by
wjar67180
Australia
wjar67180 wrote:

THX very great program indeed.

PS - WGS ran 3-4 days using -p 12

 

James

ADD COMMENTlink written 2.8 years ago by wjar67180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour