CNVkit output problem: Is "log2" the same as "Seg_mean"? OR how can I get "Seg_mean" with "log2"?
2
0
Entering edit mode
2.8 years ago
Laven9 • 0

I have just get my CNV files by CNVkit. I am wondering if the column "log2" in the output of CNVkit (after call) is the same as "Seg_mean". If not, how can I get the "Seg_mean" with "log2"? Please, give me some advice,thanks!

CNV CNVkit Seg_mean • 1.5k views
0
Entering edit mode

0
Entering edit mode

Here are two lines of what I get.

chromosome  start   end log2    probes
chr1    826717  2410579 -0.00659771 487
chr1    2410780 2787772 -0.372291   70


I get an answer like this: Segment_Mean is the arithmetic mean of those probes' log2 copy ratio values.
But I am still confused how can I get "Segment_Mean"? I need it as an input to ABSOLUTE.

0
Entering edit mode

And I have got CNV file by Varscan too ,but the "Segment_mean" is quite too large.

0
Entering edit mode

0
Entering edit mode
2.7 years ago
Eric T. ★ 2.7k

In the .cns files, yes, log2 is the segment mean in log2 scale. Details here: https://cnvkit.readthedocs.io/en/stable/fileformats.html

0
Entering edit mode

And I am now facing other problem using CNVkit, could you please give me some advice? Details are as follows: I am running CNVkit for CNV files of my whole-exon sequencing data. I use command like cnvkit.py batch -m amplicon -t targets.bed *.bam , but I can not provide the targets.bed file. And I also check Astra-Zeneca’s reference data repository but cannot find as well.

My questions are: 1) Is that right I use -m amplicon ? 2) Is there any file containing total exons of human I can use for script guess_baits.py ? I am really confused where I can get the total bed file I can use for guess!

I will appreciate it if you could give me some advice!

0
Entering edit mode

For exome, -m hybrid is better than -m amplicon. You can verify that there are off-target reads by loading the BAM file in a viewer like IGV.

For guess_baits.py, try UCSC's RefSeq exons (refFlat.txt here), or another BED file of known genes from UCSC Genome Browser. Make sure the reference genome matches.

0
Entering edit mode

Thanks a lot! I got it, but I do also want to make sure I am doing the right thing. Here what I did.

skg_convert.py refFlat.txt -t bed -o refFlat.bed
guess_baits.py bam1 bam2 -t refFlat.bed -o guess_baits.bed


But I get error like this:

Loaded 80816 candidate regions from refFlat.bed
Evaluating targets in bam1
Time: 1281.040 seconds (205575 reads/sec, 61 bins/sec)
Summary: #bins=78477, #reads=263349347, mean=3355.7520, min=0.0, max=197074.45
Percent reads in regions: 279.509 (of 94218509 mapped)
Traceback (most recent call last):
File "miniconda2/bin/guess_baits.py", line 246, in <module>
baits = filter_targets(args.targets, args.sample_bams, args.processes)
File "miniconda2/bin/guess_baits.py", line 54, in filter_targets
"%d != %d" % (len(sample), len(baits))
AssertionError: 78477 != 80816


What does it mean?

0
Entering edit mode

Hmm, not sure, I'll take a look to see if there's a bug in guess_baits.py.

If you're building a pooled reference (multiple control samples), you can also just use the refflat.bed file as-is and CNVkit will drop most of the uncaptured exons automatically.