Question: Generating Protein Databases From Snp And Indel Information
2
gravatar for Doug
7.6 years ago by
Doug20
Doug20 wrote:

Using NGS technology I recently detected thousands of SNPs and indels in a yeast strain for which we have proteomic data. I wrote software to generate a protein database from these results. However, i ran into several troubling events including disruption of start codons, stop codons, and itron/exon boundaries. For each case, I made my own judgement calls and moved on. But I would like to compare my results to others. Is anyone aware of software that generates protein fasta files from genomic data? I currently have a .vcf file but could probably convert it int other usable formats if necessary.

I have looked into several variant effect predictor tools including Polyphen2, annovar coding_change.pl), snpEff, and EnsEMBL Variant effect predictor. However, these tools are more focused on predicting phenotypic effect than simply generating a fasta file. They might do what i am looking for but if so I haven't figured out how to do it. I would appreciate any input or feedback on this subject.

proteomics vcf fasta • 1.9k views
ADD COMMENTlink modified 7.6 years ago by Zev.Kronenberg11k • written 7.6 years ago by Doug20
1
gravatar for Zev.Kronenberg
7.4 years ago by
United States
Zev.Kronenberg11k wrote:

Have you though about annotating the variants using VAT which is a part of the VAAST suite? VCF->GVF->annotation is a relatively easy.

ADD COMMENTlink written 7.4 years ago by Zev.Kronenberg11k

You could then use these annotations to create protein sequences.

ADD REPLYlink written 7.4 years ago by Zev.Kronenberg11k
0
gravatar for Larry_Parnell
7.6 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Because your genome encodes mostly non-spliced or single-exon protein-coding genes, I think that the analysis approach would be rather straightforward. Thus, what comes to mind is the analysis pipeline followed by those looking into pathogenic outbreaks such as EHEC/EAEC O104:H4 in Germany last summer. While the focus of that and similar studies was genome sequencing without proteomic data, they likely employed a rapid screen to identify protein-based differences between a standard, benign strain and the one (or several) isolated during the outbreak.

This topic is not my forte. Just an idea that comes to mind.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Larry_Parnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 781 users visited in the last hour