Question: How to Extract Full sequences for Low Quality (predicted) Protein Sequences from whole genome data
gravatar for kkumarreddy
11 months ago by
India/Hyderabad/University of Hyderabad
kkumarreddy0 wrote:


Can anyone suggest me methodologies for extracting the complete sequence for the low quality predicted protein sequence reported in refseq database or NCBI protein database?

1)I have whole genome data of more than 50X coverage. When I do blast search (with human ortholog) against the SRA data I get many sequences because my gene of interest has 4 other similar protein sequences with approx 40% sequence identity .

2) the assembly available has missing residues at the exon regions.

My aim is to find the cDNA sequence so i could clone and characterize the protein by experimental methods

Thank You for your help. Kumar

ADD COMMENTlink modified 11 months ago • written 11 months ago by kkumarreddy0

You ca use something like backtranseq from EMBOSS. Here is a link to web interface for the tool. You can obviously run it from command line if you want to by installing EMBOSS.

ADD REPLYlink written 11 months ago by genomax89k

Thanks for your suggestion. I actually used tblastn to search for the sequences. The problem is missing residues in the sequence. I am 100 % sure that the gene of my interset is present in the other species. Out of 650 amino acids, i mostly get regions covering 600 amino acids. But, this is not sufficient for generating the clone. What i dont understand from the assemblies is, even after 50X coverage, why there are still "NNNNNNNN" regions in the assemblies.

ADD REPLYlink written 11 months ago by kkumarreddy0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1184 users visited in the last hour