extracting longest unigene from de novo RNA-Seq assemblies
0
0
Entering edit mode
7.3 years ago
bioming • 0

Helo everyone, After blastx Unigenes.fa from de novo RNA-Seq assemblies to a reference proteome, I obtained the result that multicontigs belong to one gene. So I just try to uniform my data that to remove shorter congtigs and keep the longest contig representing one gene. example is as below:

From origal data

ENSDARP00000143232.1 GGCTCCTCTTTTTCAACTGGACATCCTTAAAACTGTATGAAAGGGGCGGAGCCTTTTGCTACTTGCATACTTAAGCTCCTTCACATTCCTCTAGCCCTTTACGAA ENSDARP00000143232.1 GGCTCCTCTTTTTCAACTGGACATCCTTAAAACTGTATGAAAGGGGCGGAGCCTTTTGCTACTTGCATACTTAAGCTCCTTCAC ENSDARP00000143232.1 GGCTCCTCTTTTTCAACTGGACATCCTTAAAACTGTATGAAAGGGGCGGAGCCTTTTGC

To what I want

ENSDARP00000143232.1 GGCTCCTCTTTTTCAACTGGACATCCTTAAAACTGTATGAAAGGGGCGGAGCCTTTTGCTACTTGCATACTTAAGCTCCTTCACATTCCTCTAGCCCTTTACGAA

Could you give me some suggestions or some scrips to help me, thanks!
RNA-Seq Assembly • 1.2k views
ADD COMMENT
0
Entering edit mode

Have u tried cdhit. Use query length to filter from blast file. How did you obtain unigenes?

ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6