Question

Preparing CHASMplus database for annotation

0

Entering edit mode

4.0 years ago

Zahra ▴ 110

Hi all,

I want to use CHASMplus for annotation of my VCF files by ANNOVAR tool and I should reformat it to standard ANNOVAR genericDB format (Chr, Start, End, Ref, Alt, and other information), but I don't know how can I download this database. Can I do it at all?

Thanks for any help.

ANNOVAR CHASM CHASMplus annotation • 1.9k views

ADD COMMENT • link updated 4.0 years ago by Collin ▴ 1000 • written 4.0 years ago by Zahra ▴ 110

score 1 · Answer 1 · 2021-06-28

1

Entering edit mode

4.0 years ago

Collin ▴ 1000

It is quite easy to annotate variants with CHASMplus scores by using OpenCRAVAT (https://opencravat.org/ ), either by the webserver or command line tool. I assume, though, that you likely have an existing pipeline using ANNOVAR and would like to annotate with it for consistency.

You can alway access the data underlying an annotator in OpenCRAVAT, including CHASMplus. Basically what you need to do is to install OpenCRAVAT, download the CHASMplus annotator, and then dump the CHASMplus sqlite database file to a CSV file. You could then reformat the data to what ever is needed. Commands would look like the following:

# install OpenCRAVAT with CHASMplus annotator
pip install open-cravat
oc module install-base
oc module install -y chasmplus

# change directory to installed chasmplus data
cd `oc config md`/annotators/chasmplus/data/

# dump all sqlite tables to csv files
for t in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chrX chrY transcript; do
    sqlite3 -header -csv chasmplus.sqlite "SELECT * FROM $t;" > `echo $t`.csv
done

ADD COMMENT • link 4.0 years ago by Collin ▴ 1000

0

Entering edit mode

Dear Collin,

As you kindly explained in your reply, I create the .csv files, but they don't have the reference allele and just have the alternative allele:

pos,alt,score,tid

88385824,G,0.004,34778

88385830,G,0.004,34778

88391421,G,0.01,34778

88391427,G,0.006,34778

88391451,G,0.009,34778

Would you mind helping me with how to add the reference allele to files?

ADD REPLY • link 4.0 years ago by Zahra ▴ 110

0

Entering edit mode

One of the nearby directories contains the FASTA file of the reference genome (hg38 version in 2bit format). You should be able to extract the reference sequence from that.

cd `oc config md`/commons/hg38wgs/data

ADD REPLY • link 4.0 years ago by Collin ▴ 1000

0

Entering edit mode

Thank you collin, but I need the hg19 and couldn't find it. I would greatly appreciate it if you could help me again.

ADD REPLY • link 4.0 years ago by Zahra ▴ 110

0

Entering edit mode

Unfortunately, OpenCRAVAT only natively uses hg38. As such, the CHASMplus annotations are in hg38. OpenCRAVAT normally performs a liftover "on the fly" from variants entered in hg19 coordinates to hg38 coordinates, so no data for annotators use hg19.

The easiest route therefore might be to just run ANNOVAR and then run OpenCRAVAT (specifically designating the genome version as hg19), and then manually write a script to merge the two output text files.

ADD REPLY • link 4.0 years ago by Collin ▴ 1000