Derive Protein Domain Sequence From Personal Exome Data
1
1
Entering edit mode
12.5 years ago

I am interested in a particular protein domain and I would like to extract all instance of this domain from our whole exome data. Is this theoretically possible ? Is there any protocol / references on generating protein sequences of specific coding regions from exome data ?

Thanks in advance !

exome next-gen sequencing • 2.5k views
ADD COMMENT
0
Entering edit mode

Khader, what is your input CHROM,POS,REF,ALT ? I've got a some java code to build a mRNA and to find the domains.

ADD REPLY
0
Entering edit mode

I am a bit puzzled by this question, it is very unclear to me what you are asking about generating protein sequences from exome data, is it only me? If you do whole exome seq you have the reference genome, correct? further you did a sequencing of genomic DNA enriched for exons, correct? Do you mean you want to infer the 'real' coding sequence and translate it (what extra information would the exome sequence provide then, as you must know all the exons to do the enrichment)? Or do you want to detect variations (e.g. SNPs) and infer the effect in terms of AA change of these variations?

ADD REPLY
0
Entering edit mode

@Pierre Yes, I have that input. Please let me know about your approach.

ADD REPLY
0
Entering edit mode

@Michael: This is a more of a conceptual approach. We found a missense mutation in a very well known protein domain from our exome data. This domain is part of multiple proteins and it could have other non-deleterious mutations. All I am trying to do is to derive protein sequence from exome, so that I can do an alignment using protein sequence and see how this particular domain is affected in a personal exome.

ADD REPLY
0
Entering edit mode

I see. but then I would just run the sequence through transeq for all 6 frames, then maybe use PFAM to see how much of the domain is intact?

ADD REPLY
0
Entering edit mode

Thanks Michael. I thought of similar idea, but I don't want to the step of 6 way translation and selection(or assumption) of coding transcript. IMHO, best approach could be similar to what Pierre is suggested - merge data from existing annotations along with variants specific to personal exome.

ADD REPLY
2
Entering edit mode
12.5 years ago

I would use the knownGene table from the UCSC to map the SNP to the protein. For an algorithm see:

How To Calculate The Protein Change And Codon Position Within A Nucleotide Sequence Of A Single Nucleotide Substitution?

(see here for an implementation: https://github.com/lindenb/jsandbox/blob/master/src/sandbox/VCFAnnotator.java )

Then use the mysql table kgXref to map the knownGene to swissprot.

Then use the parse the record for this protein from swissprot: (see my code for A: How To Retrieve Human Proteins Sequence Containing A Given Domain )

ADD COMMENT
0
Entering edit mode

Thanks Pierre, you've got a mail !

ADD REPLY

Login before adding your answer.

Traffic: 1911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6