Collect COV file by BedCoverage
1
0
Entering edit mode
14 months ago
Ishak ▴ 10

Hi Biostars,

I run ClinCNV ( https://github.com/imgag/ClinCNV ) to detect the copy number in WGS bam file. I shell get two files, bed and COV.

I have different types of bed file, hg38.all.bed, hg38.bed and preparedBedHg38.bin100.bed. The question is that which file is used to make COV file by BedCoverage (BedCoverage -bam $bamPath -in$bedPath -min_mapq 3 -out \$sampleName".cov")?

BedCoverage is to extract the average coverage for input regions from one or several BAM/CRAM file(s)

Ishak

copynumbervariation ClinCNV ngs • 824 views
1
Entering edit mode

Hi Ishak, you should segment your reference genome into pieces of more/less uniform length and give this bed file as input for BedCoverage.

bed file that you need should contain 3 columns, separated by tabs: chr, start, end.

bed file that is required for the further steps should also contain GC annotation as the 4th column and optionally genes as the 5th.

I'd recommend to use windows around 1.000bp of size. 100bp may take too much space on your server.

0
Entering edit mode

0
Entering edit mode
6 months ago

Hi German,

Thank you for developing this tool for the CNVs analysis. I tried to use it in our lab using 40 samples, however, I keep getting this error in the final step when running [ Rscript clinCNV.R ......] The Error as below:

[1] "We run script located in folder /home/bioinformatics/ClinCNV . All the paths will be calculated realtive to this one. If everything crashes, please, check the correctness of this path first." [1] "START cluster allocation." [1] "Cluster allocated." [1] "END cluster allocation." [1] "We are started with reading the coverage files and bed files 2022-03-23 10:57:53" [1] "ERROR: your file with normal coverages have different amount of rows with bed file or coordinates are not matching. It is most probably a technical mistake. Check the input. List of regions not presented:" chr.X start end gc genes 1 chr1 65509 65625 0.33 0 chr.X start end gc genes 2 chr1 65831 65973 0.42 0 chr.X start end gc genes 3 chr1 69481 69600 0.51 0 ...... ...... ......

I notice that for the your sample bed file and .cov file both have the same ranges. But in my case it is not the same.

Should I edit my bed and remove the extra ranges which is not included in my final coverage file? or should I edit the coverage file?

PS: I'm using ClinCNV for WES germline CNV detection.

Thank you.

0
Entering edit mode

Hi, bed file and cov file should contain same regions. This check is done on purpose so this error is an expected behaviour.

Why won't you generate .cov file with extra regions you have in your .bed file? I would do it like that.

0
Entering edit mode

Thank you for your prompt response.

How can I generate .cov file with extra regions same as in my .bed file? If I understand correctly, I should add all missing regions to my merged .cov file and put '0' value for their coverage?

Thank you.

0
Entering edit mode

To explain more about my issue:

After running [Rscript clinCNV.R ......] I get the Error as below:

"ERROR: your file with normal coverages have different amount of rows with bed file or coordinates are not matching. It is most probably a technical mistake. Check the input. List of regions not presented:"

 chr.X    start    end        gc     genes
chr1    721430   721906    0.4     0


But when I checked my bed file and the merged coverage file:

head gcAnnotated.bed:

chr1    65409   65725   0.3323
chr1    65731   66073   0.3596
chr1    69381   69700   0.4608
chr1    721281  721619  0.4290
[chr1   721430  721906  0.3992]
chr1    721751  722042  0.3883
chr1    752816  753135  0.5110
chr1    761995  762375  0.5368

X.chr   start   end Sample1 Sample2 Sample3 Sample4 Sample5
chr1    65409   65725   1.44    3.11    2.09    0   2.6
chr1    65731   66073   52.77   63.06   25.27   1.98    25.34
chr1    69381   69700   49.01   4.86    0       0.34    33.01
[chr1   721281  722042] 94.89   86.6    100     116.71  135.34
chr1    752816  753135  13.72   12.15   17.17   20.8    32.27
chr1    761995  762665  122.07  114.44  111.88  112.37  157.81


As you can see from above, the missing range (chr1 721430 721906) is actually included in the 4th row (chr1 721281 722042)

So, I'm not sure what I'm doing wrong?

Thank you

0
Entering edit mode

It is a different region. So what I normally do: I take .bed file and calculate .cov files for that .bed file. Then you have regions matching between .cov and .bed. I would do this if I were you :)

0
Entering edit mode

Thank you again for taking the time to help in this.

That is exactly what I have done. I did calculate the .cov file using the bed file and use the same one for running the final step [clinCNV.R].

I also tried different bed files, I tried the off-target and extend the regions but I always get the same error.

I follow the exact steps for running CNVs for germline WES.

At first, I did calculate coverage for 40 samples. Then, I did for only 5 samples to be easier to trace that error but no luck so far.

0
Entering edit mode

Hi, if you calculate coverage using some BED file with regions A,B,C, it should give you coverage in regions A,B,C. How could it happen that for some regions the coverage is missing?