Hello all! I have two file containing ten thousands of transcripts fasta sequences each with different ids and I am interested in finding common sequences between the two files. Somebody please help me as it is hindering my work. Thank you in appreciation
If you are interested in IDENTICAL sequences, you can simply write a very short script to extract identical sequences in both files.
You want to apply a "similarity" score, if so I strongly suggest using multi-aligners (BLAST or MUSCLE for instance) and then parse the results to have a global overview of similarity between sequences from your two different files.
EDIT @genomax commentary: Use of assembly merge-tool.