Add species name to a multiple alignment format (MAF) file
2
0
Entering edit mode
21 months ago
shinken123 ▴ 150

Hi, I have MAF files like this:

##maf version=1
a       score=-1274
s       Chr10                                            34972197            2927       +       190919061         AACCTTGGGG
s       Chr11                                            36777315            2442       +       244384623         AACCTTGGGG

a       score=-60687
s       Chr1                                             81897274           61972       +       159217232          CGTTTTCCCGG
s       Chr1                                             33997294           32248       +       200980605          CGTTTTCCCGG

Is there a tool to add automatically species names to this format to have something like this?

##maf version=1
a       score=-1274
s       species1.Chr10                                            34972197            2927       +       190919061         AACCTTGGGG
s       species2.Chr11                                            36777315            2442       +       244384623         AACCTTGGGG

a       score=-60687
s       species1.Chr1                                             81897274           61972       +       159217232          CGTTTTCCCGG
s       species2.Chr1                                             33997294           32248       +       200980605          CGTTTTCCCGG

Many thanks

format names MAF species add name multiple alignment • 587 views
ADD COMMENT
0
Entering edit mode
21 months ago
cmdcolin ★ 3.8k

it is likely better to do this before creating your MAF file, it may not be possible to unambiguously add species names after the fact. specifically: you can have MAF entries that have more than 2 lines for example, here is from a MAF file


a
s   elegans.X   17703901    60  +   17718942    CTATATCCGCAAAGTTGGGACGGACGGGCTCTGCGGAGCCCAAGTGACAACACTCCGGGG
s   elegans.I   3625790 60  -   15072434    CTATATCCGCAAAGTTGGGACGGACGGGCTCTGCGGAGCCCAAGTGACAACACTCCGGGG
s   elegans.IV  9548578 60  +   17493829    ATACATCCGCAAAGTTGGGACGGACGGGCTCTGCGGAGCCCAAGTGACAACACTCCGGGG
s   elegans.V   791349  60  +   20924180    ATACATCAGCAAAGTTGGGACGAATGGGCTCTGAGGGGCCCAAGTCACAACACTCCGGGG
s   elegans.V   19270529    60  +   20924180    TATAATCCGCAAAGTTGGGGCGGAGGACCTCTACGGAGGCGAAGTCACAACATTCCGGGG
s   elegans_vc2010.V    786777  60  +   20182852    ATACATCAGCAAAGTTGGGACGAATGGGCTCTGAGGGGCCCAAGTCACAACACTCCGGGG
s   elegans_vc2010.X    17522306    60  +   17537347    CTATATCCGCAAAGTTGGGACGGACGGGCTCTGCGGAGCCCAAGTGACAACACTCCGGGG

you would not know which of those rows would be elegans and which are elegans_vc2010 without that species annotation already being there if the chromosome names are similar in both elegans and elegans_vc2010

ADD COMMENT

Login before adding your answer.

Traffic: 2970 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6