Question

Derive Protein Domain Sequence From Personal Exome Data

1

Entering edit mode

12.5 years ago

Khader Shameer 18k

I am interested in a particular protein domain and I would like to extract all instance of this domain from our whole exome data. Is this theoretically possible ? Is there any protocol / references on generating protein sequences of specific coding regions from exome data ?

Thanks in advance !

exome next-gen sequencing • 2.5k views

ADD COMMENT • link updated 10.0 years ago by Biostar 20 • written 12.5 years ago by Khader Shameer 18k

0

Entering edit mode

Khader, what is your input CHROM,POS,REF,ALT ? I've got a some java code to build a mRNA and to find the domains.

ADD REPLY • link 12.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I am a bit puzzled by this question, it is very unclear to me what you are asking about generating protein sequences from exome data, is it only me? If you do whole exome seq you have the reference genome, correct? further you did a sequencing of genomic DNA enriched for exons, correct? Do you mean you want to infer the 'real' coding sequence and translate it (what extra information would the exome sequence provide then, as you must know all the exons to do the enrichment)? Or do you want to detect variations (e.g. SNPs) and infer the effect in terms of AA change of these variations?

ADD REPLY • link 12.5 years ago by Michael 54k

0

Entering edit mode

@Pierre Yes, I have that input. Please let me know about your approach.

ADD REPLY • link 12.5 years ago by Khader Shameer 18k

0

Entering edit mode

@Michael: This is a more of a conceptual approach. We found a missense mutation in a very well known protein domain from our exome data. This domain is part of multiple proteins and it could have other non-deleterious mutations. All I am trying to do is to derive protein sequence from exome, so that I can do an alignment using protein sequence and see how this particular domain is affected in a personal exome.

ADD REPLY • link 12.5 years ago by Khader Shameer 18k

0

Entering edit mode

I see. but then I would just run the sequence through transeq for all 6 frames, then maybe use PFAM to see how much of the domain is intact?

ADD REPLY • link 12.5 years ago by Michael 54k

0

Entering edit mode

Thanks Michael. I thought of similar idea, but I don't want to the step of 6 way translation and selection(or assumption) of coding transcript. IMHO, best approach could be similar to what Pierre is suggested - merge data from existing annotations along with variants specific to personal exome.

ADD REPLY • link 12.5 years ago by Khader Shameer 18k

Ram · Answer 1 · 2011-10-11

2

Entering edit mode

12.5 years ago

Pierre Lindenbaum 161k

I would use the knownGene table from the UCSC to map the SNP to the protein. For an algorithm see:

How To Calculate The Protein Change And Codon Position Within A Nucleotide Sequence Of A Single Nucleotide Substitution?

(see here for an implementation: https://github.com/lindenb/jsandbox/blob/master/src/sandbox/VCFAnnotator.java )

Then use the mysql table kgXref to map the knownGene to swissprot.

Then use the parse the record for this protein from swissprot: (see my code for A: How To Retrieve Human Proteins Sequence Containing A Given Domain )

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks Pierre, you've got a mail !

ADD REPLY • link 12.5 years ago by Khader Shameer 18k