If I compare alignment from RepeatMasker (cross_match engine, dfam library) with ssearch (fasta36) I get a different result.
- ssearch v36.3.8g
- repeatmasker v4.1.0
- cross_match v1.090518
- dfam v3.1
My questions is: Why the results are so different? Did I misunderstood something or its because of performance, or ssearch is bad for this, .....
RepeatMasker
SW perc perc perc position in query matching position in repeat
score div. del. ins. begin end repeat begin end (left)
2373 25.9 9.0 3.7 38256 39464 + MLT1E1A 119 1371 (0)
917 27.4 9.2 6.4 39465 39623 + MLT1E1A 1 172 (388)
2138 12.0 0.0 0.0 39624 39924 + AluSx 1 301 (11)
923 26.0 12.8 3.8 39925 40294 + MLT1E1A 173 666 (14)
MLT1E1A sequence
>MLT1E1A 680 BP; 181 A; 162 C; 189 G; 143 T; 5 other;
TGTGGTAGGCAGAATTCTAAGATGGCCCCCAAGATTCCCGCCCCCTGGTGTACACGCCCT
GTATAATCCCCTCCCCTTGAGTGTGGGCGGGACCTGTGAATATGATGGGATATCACTCCC
GTGATTAGGTTACATTATATGGCAAAGGTGAAGGGATTTTGCAGATGTAATTAAGGTCCC
TAATCAGTTGACTTTGAGTTAATCAAAAGGGAGATTATCCTGGGTGGGCCTGACCTAATC
AGGTGAGCCCTTAAAAGAGGCATGGGCCCTCCAGAGAGAAGAACAGAGAGATTCTCCTGC
TGGCCTTGAAGAAGCAAGCTGCCATGTTGTGAGAGGGCCTATGGAGAGGGCCACGTGGCA
AGGANCTGNGGGCGGCCTCTAGGAGCTGAGAGCGGCCCCCGGCCGACAGCCAGCAAGAAA
ACGGGGACCTCAGTCCTACAGCCGCAAGGAANTGAATTCTGCCAACAACCTGAATGAGCT
TGGAAGNGGACCCTAAGCCTCAGATGAGAACGCAGCCCCGGCCGACACCTTGATTNCAGC
CTTGTGAGACCCTGAGCAGAGGACCCAGCTAAGCCGTGCCCGGACTCCTGACCCACAGAA
ACTGTGAGATAATAAATGTGTGTTGTTTTAAGCCGCTAAGTTTGTGGTAATTTGTTACGC
AGCAATAGAAAACTAATACA
First aligment
Based on the first row from RM, I took sequence on position 38256-39464 and made an alignment with ssearch.
I've expected that alignment will be over the entire sequence (and MLT1E1A within 119-1371).
Ssearch gave me:
(sequence doesn't start from 1 and MLT1E1A doesn't start from 119)
60 70 80 90 100 110
MLT1E1 CCTGTATAATCCCCTCCCCTTGAGTGTGGGCGGGACCTGTGAATATGATGGGATATCACT
:::::: ::::: :::: ::::: :::
chromo TCTACACCTAAACCCGATTTAGATGAGATTCGGGAC-TGTGAGCATGAAGGGATCTCAAG
1100 1110 1120 1130 1140 1150
120 130 140 150 160 170
MLT1E1 CCCG-TGATTAGGTTACATTATATGGCAAAGGTGAAGGGATTTTGCAGATGTAATTAAGG
: ::: : ::: :: ::: :::: ::: : ::: ::
chromo AGGGGTGAATGTGTT---TTGCATGCACAAGGGACAGGAGTCTTGGGGACAGAGGACAGG
1160 1170 1180 1190 1200
Second aligment
Its better (sequence starts from 1 but its not over the entire sequence)
10 20 30 40 50 60
MLT1E1 TGTGGTAGGCAGAATTCTAAGATGGCCCCCAAGATTCCCGCCCCCTGGTGTACACGCCCT
:::::: :::::: : ::::: :: :::::: : ::: :: ::: : ::: ::::
chromo TGTGGT-GGCAGA-TACTAAGGTGACCCCCAC-AACCCCCACCTCTGCCATTCACACCCT
10 20 30 40 50
70 80 90 100 110
MLT1E1 -GTATAATCCCCTCCCCTTGAGTGTGGGCGGGACCTGTGAATATGATGGGATATCACTCC
: :::::::::: : :: : ::: :: : :::::::
chromo TGAATAATCCCCTTCTCTGGT-TGTAAGCAGAACCTGTGGCTTGCTTATGAAGGAGGCGG
60 70 80 90 100 110
Try this and then revert if not works. ./RepeatMasker -pa 8 -spec "species_name" -dir temp_species results/chr1.fa