Hello-
I am trying to figure out which transcripts in my Trinity.fasta file are coding regions vs non-coding regions. I wanted to filter out all of the transcripts that are in the transdecoder output since this would in theory be the coding regions.
However, the row names in Trinity.fasta.transdecoder.pep are not found in my Trinity.fasta file. Can anyone explain to me why this is and if you have any suggestions for finding the non coding regions of the fasta file?
Here is an example of the Trinity.fasta.transdecoder.pep file
>TRINITY_DN0_c0_g1_i1.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i1:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i2.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i2.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i2:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i3.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i3.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i3:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i4.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i4.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i4:3-338(-) [Linux@vaughan Trinotate]$ head -10 Trinity.fasta.transdecoder.pep
>TRINITY_DN0_c0_g1_i1.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i1:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i2.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i2.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i2:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i3.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i3.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i3:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i4.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i4.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i4:3-338(-) [Linux@vaughan Trinotate]$ head -11 Trinity.fasta.transdecoder.pep
>TRINITY_DN0_c0_g1_i1.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i1:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i2.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i2.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i2:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i3.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i3.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i3:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL TRSRQRAGDRTESSAAGAAGQTGGERSGAGWVNAAEPKSNQSPPPRLSVNSL
>TRINITY_DN0_c0_g1_i4.p1 TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i4.p1 ORF type:3prime_partial len:113 (-),score=25.63 TRINITY_DN0_c0_g1_i4:3-338(-) MLRSRGMLKSRCCVLLGDLRVLLLGPPAPPTPLPPLTPMSGQDSESHDASVTVPDNNNTL
here is an example of the output from the Trinity.fasta file
TRINITY_DN526290_c0_g1_i1 len=297 path=[0:0-296] CTTCAACAGTTTAAATTTCTTCAGTTTTAAATCATTTTTTTCAAAGGAATGAATGAACACTATGGCCCTGATCTGGCTGCAGCTCCATACCTAGGAGGGTAATCCCTCACTGAGTTCCAAACAGAAGCAGGTCTTATCCACAGACGGTGGACAGGACAGACATAAGAAGCCCAAAGTGTTGGAGCAAAAACAGAACAGGCTCCGTTAGGTAAAAACATCTGAGCTGTGAAAAGGTTCGTCCAAACGGCGTTAAACCGATCGGTCCGTTCCCTTTCACAAAGCCTGTGACCCGAGAGA TRINITY_DN526244_c0_g1_i1 len=256 path=[0:0-255] CGTTTGCATTGTCAAGAATTTTTAAATGTTCAAATGTTCAAATCTATATTGAGATGTACAATTATTTCTTCTAATTTTTTTTCTTTTTTTTGTCCACTTTTTAGTCACAAAGCAGGGACAATACACCTTTTACGTAAGTTACAGCAAACACAAAGTCATTTCAGCCCTCTTTTCACGCAGTCCTAAAGGGTCAGGCGCTCTGCCTGTTTGTTTTTCCCCGTCTCCGGGCTGTAGGGTCAGCATCCAGAAGAGCAAA TRINITY_DN526268_c0_g1_i1 len=215 path=[0:0-214] GAGTCGTCCTCCTCCAGACTCTCTTCCAATGAGTCGGTGGTGTCCTCAAAGAACCGGATGCACTCTTGTTCCTCCTGGCTCAGAAACTGCAGACCGTCATCCTCCTATGAAAGCCATAGGAGAGAAAGAAAGATGCATTAAAGCGATTCTAATTTTGACCTAATAAATTTCTTTATGAATTGTAATAGCAAAGAAACATCTGAGGCTAAATCCTG TRINITY_DN526204_c0_g1_i1 len=224 path=[0:0-223] AGAAACACAGATGGGCATGCGTTTCCTTTCACTGCCCACAGCGACTCCTTCCAAACTTGTGTCTTCCACGAATTACGTATATGATAATTACATCCTATTTTCACCTCTTTATAATGAAGGCAGGAAAAGGGAGGAAGAATAGACAGAAACGAGAGAAATGCCTCCTCCTCATTTTCCCCTCTTCCTTTTGCTGAGCAACCTTTCATGTTGGCAGAGCAGGCCTG TRINITY_DN526278_c0_g1_i1 len=221 path=[0:0-186 1:187-220] CCACATGCCTGAATCTGTGCTGGCAAAACGCGGGTTTTTGGGGCTCGGTTTTTGGGCTGCTGGAGGGTGGAGGGTTCCTCGCTGGTTTTGAATGATGCAGTCATGTTCATAGCACAGTTCTTGACCAGATCAGCTCTACCGATGGTTTTGACCTTTATGATGCGGGTTGCATTGTGAAGAGATTCATCTCTCTCTCTCGCTCTCTCTCTCTCTCTGTCCGC
Thank you so much in advance!!