Question: How to annotate CNV events with gene information?
1
gravatar for bioinforesearchquestions
2.9 years ago by
United States
bioinforesearchquestions270 wrote:

Hello friends,

I have CNV calls from four different CNV callers. I would like to annotate each CNV calls with gene information.

What are commonly used tools to annotate CNVs?

How much of overlap do I need to consider between CNV calls and gene coordinates, if I am using bedtools intersect to annotate CNV calls.

I have 300 samples. Therefore I am looking for command line options.

snp rna dna-seq annovar cnv events • 2.5k views
ADD COMMENTlink modified 2.9 years ago by Amitm1.6k • written 2.9 years ago by bioinforesearchquestions270

CNV annotation can be easily automated (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information)!

You can look at this post describing the annotSV tool: Annotation for SV and CNV

ADD REPLYlink modified 15 months ago • written 15 months ago by LGMgeo90
3
gravatar for Amitm
2.9 years ago by
Amitm1.6k
UK
Amitm1.6k wrote:

hi,

A float of 0.5 passed to -f seems reasonable, in intersectBed. Once happy with threshold, make a shell script like this -

bedtools2-2.20.1/bin/intersectBed \
-a "$1" \
-b Homo_sapiens.GRCh37_BED_SORTD.txt \
-wao \
-f 0.5 \
>"$1"_ANNO

And save it in a file, say CNV_anno.sh Then you could run something like this on the shell -

for myCNVreslts in $(ls cnv_result_*); do
    sh CNV_anno.sh $myCNVreslts
done

Assuming that your result files start with prefix pattern cnv_result_*. Alter the pattern depending on your exact filename and dir location.

The output files would get a suffix of _ANNO. You can change again.

The -wao param in intersectBed, ensures that both features are printed out in the result, with the overlap detail.

ADD COMMENTlink written 2.9 years ago by Amitm1.6k

Thanks, Amit. I was away for a conference.

bedtools intersect -wa -wb -a Homo_sapiens.GRCh37_BED_SORTD.bed -b Sample1_cnv_file.bed -f 0.5 -r > GRCg37_Sample1_overlap.txt

Also, now I am annotating my CNV events with DGV database using annovar tool.

First I tried, this command "$ annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype dgvMerged example/ex1.avinput humandb/". All my 500 cnv events got annotated.

Do I need to increase the minimum overlap fraction ?

Does it mean all my CNV events are common in the population?

How do I check my CNVs are pathogenic or not?

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by bioinforesearchquestions270

I was able to follow this approach and get the annotations for the CNVs. Essentially, I got the genes which are overlapping with CNVs and then I assigned the status (Amp/Del/Neutral) to each gene according to CNV status. However, this is a mere overlap approach and what is your opinion on directly using this (Amp/Del) status in visualization tools like maftools? I know that there are tools like GISTIC can be run - but our data is non-human and GISTIC and many other standard tools may not work.

ADD REPLYlink modified 15 months ago • written 15 months ago by sutturka150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour