I have two fasta files with same seqnames but with slightly different sequence names like follows:
File_1 >trinity5_comp_3_0_1 TTATGCATAT >trinity5_comp_9735_1_5 AAAATGATGA >trinity5_comp_645_0_2 TCGAATGCGA >trinity5_comp_3169_0_1 AGGATATTAC FIle2 >trinity5_comp_3_0_1 TTATGCATAT >trinity5_comp_9735_1_5 AAAATGCCGA >trinity5_comp_645_0_2 TAGAATGCGA >trinity5_comp_3169_0_1 AGGATATTAC
I would like compare each sequence of File1 with respect to corresponding sequences in File2 and compute its percentage of similarity like follows:
trinity5_comp_3_0_1 100% trinity5_comp_9735_1_5 80% trinity5_comp_645_0_2 90% trinity5_comp_3169_0_1 100%
I tried using
cd-hit-est-2d but sequences are also compared with other sequences rather than its own corresponding sequences in file2. Kindly guide me.
Thanks in advance