Question: Filter all repeat-derived sequences from UniProtKB/Swiss-Prot
0
gravatar for Ole Kristian Tørresen
5.5 years ago by
Oslo
Ole Kristian Tørresen130 wrote:

Hi,

I'd like to remove all repeat-derived (like transposon proteins) from a UniProtKB/Swiss-Prot file (for instance ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz). Is there an easy way to do it? Is there, for instance, somewhere I can find all GO terms associated with transposon proteins? 

Thank you.

annotation gene • 1.3k views
ADD COMMENTlink modified 5.5 years ago by Siva1.7k • written 5.5 years ago by Ole Kristian Tørresen130
0
gravatar for Siva
5.5 years ago by
Siva1.7k
United States
Siva1.7k wrote:

You might want to have a look at this thread  How To Identify Genes In Transposable Elements

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Siva1.7k

Thank you Siva.

I'd expect that there's certain GO terms associated with transposons, but I have not been able to find if all transposon derived proteins in UniProtKB/Swiss-Prot are consistently annotated with the right GO term (nor which might be the correct GO term). I'm unsure of that post you are referring to would help. I could download RepBase and compare everything there to the  UniProtKB/Swiss-Prot fasta file, but there should be an easier way to do it.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Ole Kristian Tørresen130

I agree that using already existing annotation is easier than searching RepBase or Pfam domains against UniProt. I am not familiar with the GO annotation in UniProt. Can you try the UniProt keywords?

Searching the UniProt data with keyword "transposable element"

There also seems to be an entry called "TRANSPOSON" in the optional Reference Comment (RC) line in the sequence entry.

I am not sure if both these options are the same and if they can find all the proteins encoded in the transposons.

ADD REPLYlink written 5.5 years ago by Siva1.7k

That keyword "transposable element" is a great suggestion. That's the most comprehensive way to attacking this problem I've come across. 

Thank you.

ADD REPLYlink written 5.5 years ago by Ole Kristian Tørresen130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour