How to deal with the masking sequences when analyze the orthologus gene pairs using orthomcl?
0
0
Entering edit mode
9.7 years ago
liuhui ▴ 20

Dear all,

Recently, I used Trinity to assembly my illumina reads to transcriptome and masked the repeat elements using RepeatMasker.

And then, I analyzed the coding regions using TransDecoder to generate the protein sequence set.

And I would like to take the protein sequence excluding those shorter than 21 amino acids, to analyze the orthologus gene pairs using OrthoMCL.

But after using the scripts orthomclAdjustFasta and orthomclFilterFasta (21, 20), the sequences that shorter than 21 aa are still there. More detail please see the following lines.

Could you please tell me what should I do?

Any advice would be great.

Thank you very much in advance.

>TRINITY_DN0_c0_g2_i1_orf1 type:internal len:106 KGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTKSSN >TRINITY_DN2826_c0_g1_i1_orf1 type:3prime_partial len:112 MVRDDHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >TRINITY_GG_2915_c1_g1_i1_orf1 type:internal len:133 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXEAVEFEAATAEVKIDLRGLEDLLFGTA >TRINITY_GG_3876_c0_g1_i1_orf1 type:internal len:120 RMQASGVQYGMADVSQFMVGRGPSTRVQNIFQVSPSSDHQQQQYSSQTXXXXXXXXXXXXXXXXXXLLRQQEHRKDQMVAAAEKVGEGSAYNSPCKHLEPSPTPAHQAAQAGNISTDKA

###########

>pde|TRINITY_DN0_c0_g2_i1_orf1 KGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTKSSN >pde|TRINITY_DN2826_c0_g1_i1_orf1 MVRDDHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >pde|TRINITY_GG_2915_c1_g1_i1_orf1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXEAVEFEAATAEVKIDLRGLEDLLFGTA >pde|TRINITY_GG_3876_c0_g1_i1_orf1 RMQASGVQYGMADVSQFMVGRGPSTRVQNIFQVSPSSDHQQQQYSSQTXXXXXXXXXXXXXXXXXXLLRQQEHRKDQMVAAAEKVGEGSAYNSPCKHLEPSPTPAHQAAQAGNISTDKA

RNA-Seq masked orthomcl • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 4102 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6