Hi everyone,
I am processing a vcf file from ExAC browser (release 1.0) with hg19 as human reference genome. I want to identify only those mutations which occur at mRNA positions. Is there any field/column in vcf that can give me that information.
For now, I tried to use "BIOTYPE" field from vcf. I selected only those positions for which "BIOTYPE=Protein_coding". However, when cross-checking these positions with exon start/stop positions from USCS browser, some of these are marked as non-coding RNA. How is this possible or "BIOTYPE=protein_coding" does not give me the right information about RNA type?
Example: 138593 position in chr1 in vcf is marked with BIOTYPE=protein_coding. This position is part of LOC729737 gene. When I look for this position in ucsc file, it is marked as non-coding RNA (as per ucsc kgXref table).
See here:
Hi,
Thanks for your response. May you please explain this more.
What is unclear? The page summarizing LOC729737 and there you see that based on Ensembl automatic analysis pipeline there is some yet not verified evidence for a protein-coding gene.
Also check the Ensembl genome browser to see if there are any other annotated transcripts in your location of interest. UCSC and/or other sources sometime exclude potential isoforms that don't have as much supporting data.