VEP output is only protein_coding
2
0
Entering edit mode
12 months ago
storm1907 ▴ 30

Hello, I am supposed to extract both protein coding and synonymous variants from VCFs that were given to me. Only variant consequence i find here is "Protein_coding", but no strings as "synonymous" are present there. Is that some error with VEP?

Thank you!

VEP • 514 views
0
Entering edit mode
12 months ago

Hi, Protein_coding is not exactly a consequence - it just provides information about the location of the variant with respect to a given transcript isoform.

I think that you may want to think about the definition of 'synonymous':

• synonymous base substitution (synonymous variant / synonymous mutation): a base change in a protein coding region that does not alter the resulting amino acid sequence
• non-synonymous base substitution (non-synonymous variant / non-synonymous mutation): a base change in a protein coding region that does [yes] alter the resulting amino acid sequence

If you wish, please show the command that you used to annotate the variants, and also show a sample of the output that was produced.

Kevin

0
Entering edit mode
12 months ago
storm1907 ▴ 30
chr2    60553286    .   G   GGGC    46.6    PASS    CSQ=||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000335712|protein_coding|1/3||372-373||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000356842|protein_coding|1/5||285-286||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000358510|protein_coding|1/4||123-124||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000359629|protein_coding|1/5||348-349||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000409351|protein_coding|1/3||214-215||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000642384|protein_coding|1/4||368-369||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000642439|protein_coding|1/4||272-273||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000643004|protein_coding|1/3||175-176||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000643716|protein_coding|1/2||346-347||||,||||||||||||MODIFIER|BCL11A|ENSG00000119866|ENST00000646249|protein_coding|2/5||697-698|||| GT:GQ:DP:AD:VAF:PL  0/1:29:103:23,75:0.728155:46,0,29
chr2    60768981    .   A   T   67.5    PASS    CSQ=|FAIL|0.00|0.00|0.00|0.00|3|13|-26|15|||MODIFIER|PAPOLG|ENSG00000115421|ENST00000238714|protein_coding||5/21|||||   GT:GQ:DP:AD:VAF:PL  0/1:67:13:6,7:0.538462:65,0,99
chr2    60780647    .   A   G   59.5    PASS    CSQ=|FAIL|0.00|0.00|0.00|0.00|1|37|0|-25|||MODIFIER|PAPOLG|ENSG00000115421|ENST00000238714|protein_coding||9/21|||||    GT:GQ:DP:AD:VAF:PL  1/1:58:32:0,32:1:59,63,0
chr2    60780851    .   T   C   60.5    PASS    CSQ=|FAIL|0.00|0.00|0.00|0.00|8|47|-8|-3|||MODIFIER|PAPOLG|ENSG00000115421|ENST00000238714|protein_coding||10/21|||||   GT:GQ:DP:AD:VAF:PL


and command line:

command_line
--plugin Mastermind,/opt/vep/.vep/source_1.gz --plugin SpliceAI,snv=/opt/vep/.vep/source_2.gz,indel=/opt/vep/.vep/source_3.gz,cutoff=0.4 --verbose --no_stats --force --allow_non_variant --gencode_basic --offline --dont_skip --distance 100 --vcf --compress_output gzip --fork 72 --fields 'Allele,SpliceAI_cutoff,SpliceAI_pred_DS_AG,SpliceAI_pred_DS_AL,SpliceAI_pred_DS_DG,SpliceAI_pred_DS_DL,SpliceAI_pred_DP_AG,SpliceAI_pred_DP_AL,SpliceAI_pred_DP_DG,SpliceAI_pred_DP_DL,Mastermind_MMID3,Mastermind_counts,IMPACT,SYMBOL,Gene,Feature,BIOTYPE,EXON,INTRON,cDNA_position,Protein_position,Amino_acids,Codons,STRAND'


When I analyze this kind of file in Illumina Variant Interpreter, I get information from that cloud about synonymous variants too. But I am not able to find anything relating synonymous in this vcf. Also I dont get why for some variants CSQ field is duplicated. I need to extract some columns, but their count is not even equal.