* Thanks for your answers, my dear colleagues, it seems that I can't click that reply button *
I have 2 sets of fasta sequences, they are actually 2 genes of 9 species. I put the sequences of 9 species of the same gene into one folder, and the other gene into another folder. Now I want to concatenate two genes together for each species, but the first line of each fasta file looks like:
>HM357896.1 Persicaria lapathifolia voucher CPU:X. H. Meng 0945 ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL)
>JF953049.1 Acorus calamus voucher WH1 maturase K (matK) gene, partial cds; chloroplast
I think regular expression must be useful here, but how? Thank you.
UPDATE:: Sorry about my misleading description. To be specific, e.g. I have five species A B C D E, and two genes rbcL and matK. For each species I have two sequences, rbcL and matK. Thus I have 10 sequences in total (5 x 2). Then I combine all rbcL sequences (of five species) into one fasta, say all_rbcL.fasta, and I do the same to matK genes to make a all_matK.fasta. However, the first lines of these sequences seems to be messy, they do contain species name and gene name, but along with many other info.
How can I concatenate two genes together, and the species names must match each other?
UPDATE2:: (How could I enter code blocks?)
all_rbcL.fasta: >sp1 rbcL sequence >sp2 rbcL sequence >sp3 rbcL sequence >sp4 rbcL sequence >sp5 rbcL sequence all_matK.fasta: >sp1 matK sequence >sp2 matK sequence >sp3 matK sequence >sp4 matK sequence >sp5 matK sequence
I mean something like this, and what I expected is:
concatenated.fasta: >sp1 matK rbcL sequence sequence >sp2 matK rbcL sequence sequence >sp3 matK rbcL sequence sequence >sp4 matK rbcL sequence sequence >sp5 matK rbcL sequence sequence
These two genes are from chloroplast, I do this to use them to build a phylogenetic tree of those 9 species, is it impossible or improper? I consulted a professor and he told me it is OK, and I would like to hear your opinions, thank you.