mRNA positions from vcf
0
1
Entering edit mode
2.2 years ago

Hi everyone,

I am processing a vcf file from ExAC browser (release 1.0) with hg19 as human reference genome. I want to identify only those mutations which occur at mRNA positions. Is there any field/column in vcf that can give me that information.

For now, I tried to use "BIOTYPE" field from vcf. I selected only those positions for which "BIOTYPE=Protein_coding". However, when cross-checking these positions with exon start/stop positions from USCS browser, some of these are marked as non-coding RNA. How is this possible or "BIOTYPE=protein_coding" does not give me the right information about RNA type?

Example: 138593 position in chr1 in vcf is marked with BIOTYPE=protein_coding. This position is part of LOC729737 gene. When I look for this position in ucsc file, it is marked as non-coding RNA (as per ucsc kgXref table).

SNP ExAC Exome VCF • 420 views
ADD COMMENT
0
Entering edit mode

See here:

ID: C9J4L2_HUMAN
DESCRIPTION: SubName: Full=Uncharacterized protein;
CAUTION: The sequence shown here is derived from an Ensembl automatic analysis pipeline and should be considered as preliminary data.
ADD REPLY
0
Entering edit mode

Hi,

Thanks for your response. May you please explain this more.

ADD REPLY
0
Entering edit mode

What is unclear? The page summarizing LOC729737 and there you see that based on Ensembl automatic analysis pipeline there is some yet not verified evidence for a protein-coding gene.

ADD REPLY
0
Entering edit mode

Also check the Ensembl genome browser to see if there are any other annotated transcripts in your location of interest. UCSC and/or other sources sometime exclude potential isoforms that don't have as much supporting data.

ADD REPLY

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6