I am using UCSC's multiz 100 species vertebrate multiple alignment fasta for hg19. It is refGene.exponAA.fa available here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/multiz46way/alignments/
The sequences seem to be broken up into fragments. For example the first sequence is:
>NM_152486.2_hg19_1_13 24 0 0 chr1:861322-861393+ MSKGILQVHPPICDCPGCRISSPV
In this example this is fragment 1 of 13. NM_152486.2_hg19_1_13 Further down there is 2_13, 3_13 etc.
I would like to concatenate all 13 fragments into 1 sequence for each refseq ID.
Is there existing software or a script that can perform this task?