Appending Gene Ids On Rna-Editing Sites
3
0
Entering edit mode
11.3 years ago
GPR ▴ 390

Hello, I have VarScan *.vcf outputs with SNPs and am trying to append (add) a gene ID to each chromosome coordinate per RNA-editing site called. I have tried intersecting my vcf file with a genes.gtf one using bedtools. This however leaves out a lot of information. Can anybody tell me of a good way to do this? Also! are variant calls with a 100% frequency to be taken with caution? Thanks, G.

gene id • 3.1k views
ADD COMMENT
1
Entering edit mode
11.2 years ago
dankoboldt ▴ 140

Thanks for the question, and to Alex and Jeremy for providing some answers. The good news is that VarScan doesn't make Mendelian or diploid assumptions, so you can set any --min-var-freq threshold you like in searching for RNA editing. The bad news is that RNA editing is tough to investigate with next-gen sequencing even with a good caller. See this article, brought to my attention on a different thread by Sean Davis:

http://nar.oxfordjournals.org/content/early/2013/01/08/nar.gks1443

ADD COMMENT
0
Entering edit mode
11.3 years ago

You might try the BEDOPS suite conversion script vcf2bed to convert your VCF file to BED format (our script tries to preserve as much information as possible — see the comments in the script to see how the various columns are mapped), along with the Noble lab's gtf2bed script to convert GTF to BED.

Finally, use the BEDOPS application bedmap to map the genes in the converted GTF file to the sites in the converted VCF file. For instance:

$ vcf2bed.py < sites.vcf | sort-bed - > sites.bed
$ gtf2bed genes.gtf | sort-bed - > genes.bed
$ bedmap --echo --echo-map-id --delim '\t' sites.bed genes.bed > annotatedSites.bed

The file annotatedSites.bed will be tab-delimited and contain each VCF site datum in BED format, with one additional column containing the GTF-sourced gene name (or names) that overlap with that VCF site by one or more bases.

(If there are multiple gene names associated with the site, these are placed into one column and delimited with semi-colons. Custom overlap criteria can be specified, as well.)

ADD COMMENT
0
Entering edit mode

Thanks so much! Trying this now.

ADD REPLY
0
Entering edit mode
11.3 years ago

not to evade your annotation question, but i don't VarScan would be a good tool for RDDs if it assumes mendelian alleles in individuals. The problem will be with sites that are edited, say 10%-30% of the time. Maybe the tumor module would call those.

ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6