Question: How to annotate CNV events with gene information?
2.4 years ago by
United States
bioinforesearchquestions200 wrote:

Hello friends,

I have CNV calls from four different CNV callers. I would like to annotate each CNV calls with gene information.

What are commonly used tools to annotate CNVs?

How much of overlap do I need to consider between CNV calls and gene coordinates, if I am using bedtools intersect to annotate CNV calls.

I have 300 samples. Therefore I am looking for command line options.

CNV annotation can be easily automated (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information)!

You can look at this post describing the annotSV tool: Annotation for SV and CNV

2.4 years ago by
Amitm1.6k wrote:


A float of 0.5 passed to -f seems reasonable, in intersectBed. Once happy with threshold, make a shell script like this -

bedtools2-2.20.1/bin/intersectBed \
-a "$1" \
-b Homo_sapiens.GRCh37_BED_SORTD.txt \
-wao \
-f 0.5 \

And save it in a file, say Then you could run something like this on the shell -

for myCNVreslts in $(ls cnv_result_*); do
    sh $myCNVreslts

Assuming that your result files start with prefix pattern cnv_result_*. Alter the pattern depending on your exact filename and dir location.

The output files would get a suffix of _ANNO. You can change again.

The -wao param in intersectBed, ensures that both features are printed out in the result, with the overlap detail.

Thanks, Amit. I was away for a conference.

bedtools intersect -wa -wb -a Homo_sapiens.GRCh37_BED_SORTD.bed -b Sample1_cnv_file.bed -f 0.5 -r > GRCg37_Sample1_overlap.txt

Also, now I am annotating my CNV events with DGV database using annovar tool.

First I tried, this command "$ -regionanno -build hg19 -out ex1 -dbtype dgvMerged example/ex1.avinput humandb/". All my 500 cnv events got annotated.

Do I need to increase the minimum overlap fraction ?

Does it mean all my CNV events are common in the population?

How do I check my CNVs are pathogenic or not?

I was able to follow this approach and get the annotations for the CNVs. Essentially, I got the genes which are overlapping with CNVs and then I assigned the status (Amp/Del/Neutral) to each gene according to CNV status. However, this is a mere overlap approach and what is your opinion on directly using this (Amp/Del) status in visualization tools like maftools? I know that there are tools like GISTIC can be run - but our data is non-human and GISTIC and many other standard tools may not work.

