Concatenate Sequence Fragments in Multiple Alignment Fasta
1
0
Entering edit mode
5.0 years ago
bhanratt ▴ 40

I am using UCSC's multiz 100 species vertebrate multiple alignment fasta for hg19. It is refGene.exponAA.fa available here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/multiz46way/alignments/

The sequences seem to be broken up into fragments. For example the first sequence is:

>NM_152486.2_hg19_1_13 24 0 0 chr1:861322-861393+
MSKGILQVHPPICDCPGCRISSPV

In this example this is fragment 1 of 13. NM_152486.2_hg19_1_13 Further down there is 2_13, 3_13 etc.

I would like to concatenate all 13 fragments into 1 sequence for each refseq ID.

Is there existing software or a script that can perform this task?

sequence • 2.3k views
ADD COMMENT
0
Entering edit mode
5.0 years ago
Chun-Jie Liu ▴ 280

You may try sed -n '/NM_152486.2_hg19/{n;p}' refGene.fa |tr -d '\n'

ADD COMMENT
0
Entering edit mode

Thanks for your response. I guess I didn't explain it very well. I need it to do it on all IDs and species and am just asking if anyone knows an existing method. Otherwise I can write one myself.

Thanks though!

ADD REPLY
0
Entering edit mode

Use bash loop, grep and sed one line command can deal with this problem. I give the sed part.

ADD REPLY

Login before adding your answer.

Traffic: 3503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6