Filter all repeat-derived sequences from UniProtKB/Swiss-Prot
1
0
Entering edit mode
9.5 years ago

Hi,

I'd like to remove all repeat-derived (like transposon proteins) from a UniProtKB/Swiss-Prot file (for instance ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz). Is there an easy way to do it? Is there, for instance, somewhere I can find all GO terms associated with transposon proteins?

Thank you.

gene Annotation • 2.0k views
ADD COMMENT
0
Entering edit mode
9.4 years ago
Siva ★ 1.9k

You might want to have a look at this thread.

ADD COMMENT
0
Entering edit mode

Thank you Siva.

I'd expect that there's certain GO terms associated with transposons, but I have not been able to find if all transposon derived proteins in UniProtKB/Swiss-Prot are consistently annotated with the right GO term (nor which might be the correct GO term). I'm unsure of that post you are referring to would help. I could download RepBase and compare everything there to the UniProtKB/Swiss-Prot fasta file, but there should be an easier way to do it.

ADD REPLY
0
Entering edit mode

I agree that using already existing annotation is easier than searching RepBase or Pfam domains against UniProt. I am not familiar with the GO annotation in UniProt. Can you try the UniProt keywords?

Searching the UniProt data with keyword "transposable element"

There also seems to be an entry called "TRANSPOSON" in the optional Reference Comment (RC) line in the sequence entry.

I am not sure if both these options are the same and if they can find all the proteins encoded in the transposons.

ADD REPLY
0
Entering edit mode

That keyword "transposable element" is a great suggestion. That's the most comprehensive way to attacking this problem I've come across.

Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6