Question: Generating Protein Databases From Snp And Indel Information
gravatar for Doug
7.6 years ago by
Doug20 wrote:

Using NGS technology I recently detected thousands of SNPs and indels in a yeast strain for which we have proteomic data. I wrote software to generate a protein database from these results. However, i ran into several troubling events including disruption of start codons, stop codons, and itron/exon boundaries. For each case, I made my own judgement calls and moved on. But I would like to compare my results to others. Is anyone aware of software that generates protein fasta files from genomic data? I currently have a .vcf file but could probably convert it int other usable formats if necessary.

I have looked into several variant effect predictor tools including Polyphen2, annovar, snpEff, and EnsEMBL Variant effect predictor. However, these tools are more focused on predicting phenotypic effect than simply generating a fasta file. They might do what i am looking for but if so I haven't figured out how to do it. I would appreciate any input or feedback on this subject.

proteomics vcf fasta • 1.9k views
ADD COMMENTlink modified 7.6 years ago by Zev.Kronenberg11k • written 7.6 years ago by Doug20
gravatar for Zev.Kronenberg
7.4 years ago by
United States
Zev.Kronenberg11k wrote:

Have you though about annotating the variants using VAT which is a part of the VAAST suite? VCF->GVF->annotation is a relatively easy.

ADD COMMENTlink written 7.4 years ago by Zev.Kronenberg11k

You could then use these annotations to create protein sequences.

ADD REPLYlink written 7.4 years ago by Zev.Kronenberg11k
gravatar for Larry_Parnell
7.6 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

Because your genome encodes mostly non-spliced or single-exon protein-coding genes, I think that the analysis approach would be rather straightforward. Thus, what comes to mind is the analysis pipeline followed by those looking into pathogenic outbreaks such as EHEC/EAEC O104:H4 in Germany last summer. While the focus of that and similar studies was genome sequencing without proteomic data, they likely employed a rapid screen to identify protein-based differences between a standard, benign strain and the one (or several) isolated during the outbreak.

This topic is not my forte. Just an idea that comes to mind.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Larry_Parnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 781 users visited in the last hour