How To Identify Genes In Transposable Elements
1
1
Entering edit mode
8.1 years ago

Hi!

I would like to compare protein domains from transposable elements in human and primates (for example I would like to compare gag and pol from all LTR retrotransposons in human). So far I was thinking about 2 ways:

1, hard way - find ORFs and then do something like conserved domain search

2, run LTRfinder, LTRdigest (I am having problems adding HMM functionality) or some software like this, which might be less sensitive (especially LTR finder was not able to satisfactorily identify protein domains)

3, download annotated elements (is there something like this?)

To sum it up, I would be happy to have sequences of all gag/pol genes from human (and primates).

Thank you very much for your help.

genes orf • 3.6k views
ADD COMMENT
0
Entering edit mode

What are the specific problems you are having with the HMM functionality with LTRdigest?

ADD REPLY
0
Entering edit mode

Well, I had installation problems (unused variables problem which has been reported already) but since I used another computer, it is working just fine. Right now I am stuck with how to get HMMER2 format from http://pfam.sanger.ac.uk/family/PF03732 (current version is HMMER3:)

ADD REPLY
0
Entering edit mode

You can 1) try to compile genometools with make with-hmmer=yes or 2) download a recent version of HMMER and unpack it in the /src/external/ subdirectory of genometools. If you want to use HMMER v2.3.3, you will need to use HMMER2 models, which you should be able to download from the Pfam site (the FTP link at the top).

ADD REPLY
0
Entering edit mode

the step 1) was already made and LTRdigest forces me to use HMMER2:/ However, if I understand 2) correctly, then I can actually use HMMER3 models with LTR digest by installing newest HMMER into /src/external/. That's great news. Thank you.

ADD REPLY
1
Entering edit mode
8.1 years ago
SES 8.5k

The most direct way would be to download elements from RepBase or to select a track of repeats from a genome browser. Though, the transposons may not be annotated to the level you are interested in (they may have been identified by just masking), so you will likely want to try to identify the genes directly.

  1. You could download a bunch of TE-related domain models from Pfam and search these against your translated ORFs. This is basically what LTRdigest is doing.

  2. LTR_Finder runs a program called ps_scan to identify coding domains and it is less sensitive than HMMER. Actually, I worked with the person that wrote LTR_Finder and he acknowledged this fact and recommended using HMMER. One way to accomplish this is to run LTR_Finder or LTRharvest, then give those annotations and a directory of HMMs to LTRdigest, and LTRdigest will run HMMER for you and create more fine-grained annotations.

  3. There have been a lot of studies comparing TEs in mammals, so it should be easy to find annotations in RepBase, some genome browser, or by doing a literature search and finding out what data people are using.

I can tell you that generally the gag region will be more variable because it interacts directly with host-encoded factors, whereas the pol region has more conserved domains associated with replication. Since these regions contain numerous domains, I think that following the path of #1 or #2 above would be the most accurate approach, but you should also try to find this data by one of the ways listed in #3.

ADD COMMENT
0
Entering edit mode

Thank you, I'll go for #1 and write back how it goes.

ADD REPLY

Login before adding your answer.

Traffic: 2243 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6