Question: determine reading frame of snp
gravatar for arronslacey
6.4 years ago by
United Kingdom
arronslacey280 wrote:

Hi, I have downloaded a fasta file from dbsnp of all snps for a certain gene. Unfortunately that fasta file is in nucleotide format, where I require the peptide sequence (I would have thought there would be an option to specify this when downloading). I see there are programs such as transeq that can convert them, but for each snp require the reading frame. I assume that in my fasta file containing X amount of sequences, there will be a distribution of different reading frames. How to obtain this information? Ideally I'd like to call the entire file to transeq, but I guess I will have to split the file up into individual snps and feed each one into transeq once I know the frame rate.......


unless anyone has any other suggestions to get the protein sequence. I will not be able to use UCSC unfortunately.

snp • 1.8k views
ADD COMMENTlink modified 5.1 years ago by Ibrahim Tanyalcin1.1k • written 6.4 years ago by arronslacey280
gravatar for Devon Ryan
6.4 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

Just download the GTF/GFF annotation file from UCSC and determine the nucleotide from that. You can determine the affected amino acid and/or get the appropriate reading frame in R or biopython/bioperl.

ADD COMMENTlink written 6.4 years ago by Devon Ryan97k
gravatar for Ibrahim Tanyalcin
5.1 years ago by
Ibrahim Tanyalcin1.1k wrote:

I am not exactly sure of your goal but as far as I can see you can use a software I have recently published called I-PV. I-PV takes 4 files as input, a fasta file of your amino acid sequence (NP_.... etc), a mRNA corresponding to that sequence (NM_...etc), single array of conservation scores (you can supply an array of random numbers if you are not interested in it) and lastly variation file which is in nucleotide format and can be downloaded from biomart ( second option from top). The program looks at your protein sequence and determines which frame is encoding it within the mRNA and crops the rest leaving only the coding sequence. Than keeping in mind the strand information and the transcript of your choice (there are multiple SNPs for each transcript in the variation file usually) it plots the SNPs. While plotting, the software will alert you which frame matches your protein sequence. There are 3 possible frames, so you will get 1, 2 or 3. Here is a video of making an example graph:

Since the entire protein is coded within one of 3 possible frames, the SNPs residing in that protein also inherit that frame. But if by 'reading frame' you mean the codon that codes for that SNP, then you can use the graph you generated to inspect it as well. Here is an example from MYOSIN2:

Here I select a SNP, for instance p.R1682H, than I turn on the arginines only in the sequence. Than I mouse over residue 1682 and open the codon view. I can now see the possible point mutations in that codon. One of them (the fourth one) results in histidine. And the reading frame of that SNP is at the center which is "CGC".

To use I-PV, you will need to locally have circos and perl.

I hope this helps,

Good luck with your research,

ADD COMMENTlink modified 12 months ago by RamRS30k • written 5.1 years ago by Ibrahim Tanyalcin1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1556 users visited in the last hour