Question: LTR-Harvest false positives (Maker repeat library construction)
gravatar for timo.metz
17 months ago by
timo.metz0 wrote:

Hey there,

currently I want to to a genome annotation using maker. For the repeats I followed the following protocol:

In there they use LTR-Harvest to find LTRs and then want to remove false positives. For this they provide perl scripts. I came up to point 2.1.3 where they run CRL-Step2 and should get as output two different files. In the one file, they want to have the upstream or downstream 50bp flanking sequences of LTRs having gaps. However, I do not get this file. Has anybody encountered this problem before?

If not, is there another way to filter out false positives introduced by LTR-Harvest?

kind regards

maker ltr-harvest annotation • 641 views
ADD COMMENTlink modified 12 days ago by vflorelo0 • written 17 months ago by timo.metz0
gravatar for prateekshettys
7 months ago by
prateekshettys0 wrote:

Hello there, I understand this is an old thread, but i was wondering if you managed to solve this problem? I face the same problem with a list of error and no flanking sequence file.

Thank you.

ADD COMMENTlink modified 7 months ago • written 7 months ago by prateekshettys0
gravatar for vflorelo
12 days ago by
Mexico / Irapuato / LANGEBIO
vflorelo0 wrote:

I checked the code from Meg Bowman, and apparently, at least in my case it had to do with whitespaces in the fasta deflines. In the original script there were these lines

my $repid = $seqobj1->display_id();
my $repdesc = $seqobj1->desc();
my $new_desc = $repdesc; 
$new_desc =~ s/ /_/g;
my $new_id = "$repid" . "_" . "$new_desc";
my $blank_desc = " "; 
$repdesc = $seqobj1->desc($blank_desc);
$repid = $seqobj1->display_id($new_id);

and then the script parsed $repid to construct $silly_index and $ltr_start using

my ($silly_index, $ltr_start, $seq_id, $remid, $artificial_key, %artificial_key_hash, $i2, $key3);
my $removed = Bio::SeqIO->new (-format => 'fasta', -file => $removed_repeats);
while (my $seqobj2 = $removed->next_seq()) {
      $remid = $seqobj2->display_id();
      if ($remid =~ /^(.+)_\(/) {
        $seq_id = $1;
      else {
      if ($remid =~ /\(dbseq-nr_(\d+)\)_\[/) {
        $silly_index = $1;
      if ($remid =~ /\[(\d+),/) {
        $ltr_start = $1;

If you replace the first group of lines lines with

my $repid      = $seqobj1->display_id();
my $repdesc    = $seqobj1->desc();
my $new_desc   = $repdesc;
$new_desc      =~ s/.*\(dbseq-nr //g;
$new_desc      =~ s/\) \[/ /g;
$new_desc      =~ s/\,.*//g;
$repdesc       = $seqobj1->desc($new_desc);
$repid         = $seqobj1->display_id($repid);

and the second group of lines with

my ($silly_index, $ltr_start, $seq_id, $remid, $remdesc, $artificial_key, %artificial_key_hash, $i2, $key3);
my $removed = Bio::SeqIO->new (-format => 'fasta', -file => $removed_repeats);
while (my $seqobj2 = $removed->next_seq()) {
      $remid   = $seqobj2->display_id();
      $remdesc = $seqobj2->desc();
      my @remdesc_arr = split " ", $remdesc;
      $seq_id = $remid;
      $silly_index = $remdesc_arr[0];
      $ltr_start = $remdesc_arr[1];

It should work fine. I'm no perl expert and there should be a better fix, but in the meantime this patch works. In my case there are 9 repeats, after step2 I have 19 additional files 1 with the filtered repeat sequences, 9 with the upstream sequences and 9 with the downstream sequences. Hope it helps

ADD COMMENTlink written 12 days ago by vflorelo0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour