convert detailed alignment format to fasta
0
0
Entering edit mode
2.7 years ago
sapuizait ▴ 10

Dear all

Apologies if this has been asked before but I cannot find a useful answer around

I am using kma to align reads against a set of genes and I am looking for a method/script/software to convert the detailed alignment format to a fasta format:

example input format:

# VFG000863(gb|BAA94855) (astA)
template:   ATGCCATCAACACAGTATATCCGAAGGCCCGCATCCAGTTATGCATCGTGCATATGGTGC
            ||||||||||||||||||||_||_||_|||_|||||||||||||||||||||||||||||
query:      ATGCCATCAACACAGTATATTCGGAGACCCACATCCAGTTATGCATCGTGCATATGGTGC

template:   GCAACAGCCTGCGCTTCGTGTCATGGAAGGACTACAAAGCCGTCACTCGCGACCTGA
            |||||||_|||||||||||||||||||||||||||||||||||||||||||||||||
query:      GCAACAGTCTGCGCTTCGTGTCATGGAAGGACTACAAAGCCGTCACTCGCGACCTGA

# VFG000924(gb|NP_752610) (fepB)
template:   GTGAGACTCGCCCCGCTCTACCGCAACGCCCTTCTATTAACAGGACTTTTGCTTTCAGGA
            ||||||||||||||||||||||||||||||||||||||||||||||||||_|||||||||
query:      GTGAGACTCGCCCCGCTCTACCGCAACGCCCTTCTATTAACAGGACTTTTACTTTCAGGA

template:   ATAGCCGCAGTTCAGGCCGCCGACTGGCCGCGTCAGATTACTGACAGCCGTGGCACTCAT
            ||||||||||||||||||||_|||||||||||||||||||||||||||||||||||_|||
query:      ATAGCCGCAGTTCAGGCCGCTGACTGGCCGCGTCAGATTACTGACAGCCGTGGCACACAT

thanks

alignment • 1.1k views
ADD COMMENT
0
Entering edit mode

an example of output is needed.

ADD REPLY
0
Entering edit mode

sorry, output would look like this:

>VFG000863(gb|BAA94855) (astA) template
ATGCCATCAACACAGTATATCCGAAGGCCCGCATCCAGTTATGCATCGTGCATATGGTGCGCAACAGCCTGCGCTTCGTGTCATGGAAGGACTACAAAGCCGTCACTCGCGACCTGA

>VFG000863(gb|BAA94855) (astA) query
ATGCCATCAACACAGTATATTCGGAGACCCACATCCAGTTATGCATCGTGCATATGGTGCGCAACAGTCTGCGCTTCGTGTCATGGAAGGACTACAAAGCCGTCACTCGCGACCTGA

>VFG000924(gb|NP_752610) (fepB) template
GTGAGACTCGCCCCGCTCTACCGCAACGCCCTTCTATTAACAGGACTTTTGCTTTCAGGAATAGCCGCAGTTCAGGCCGCCGACTGGCCGCGTCAGATTACTGACAGCCGTGGCACTCAT

>VFG000924(gb|NP_752610) (fepB) query
GTGAGACTCGCCCCGCTCTACCGCAACGCCCTTCTATTAACAGGACTTTTACTTTCAGGAATAGCCGCAGTTCAGGCCGCTGACTGGCCGCGTCAGATTACTGACAGCCGTGGCACACAT
ADD REPLY
1
Entering edit mode

with seqkit and sed:

$ cat <(sed -r '/template|#/!d; s/#/>/; s/^\w+\W\s+//;/>/ s/$/ template/' test.fa) <(sed -r '/query|#/!d; s/#/>/; s/^\w+\W\s+//;/>/ s/$/ query/' test.fa) | seqkit -w 0 sort --quiet -n

>VFG000863(gb|BAA94855) (astA) query
ATGCCATCAACACAGTATATTCGGAGACCCACATCCAGTTATGCATCGTGCATATGGTGCGCAACAGTCTGCGCTTCGTGTCATGGAAGGACTACAAAGCCGTCACTCGCGACCTGA
>VFG000863(gb|BAA94855) (astA) template
ATGCCATCAACACAGTATATCCGAAGGCCCGCATCCAGTTATGCATCGTGCATATGGTGCGCAACAGCCTGCGCTTCGTGTCATGGAAGGACTACAAAGCCGTCACTCGCGACCTGA
>VFG000924(gb|NP_752610) (fepB) query
GTGAGACTCGCCCCGCTCTACCGCAACGCCCTTCTATTAACAGGACTTTTACTTTCAGGAATAGCCGCAGTTCAGGCCGCTGACTGGCCGCGTCAGATTACTGACAGCCGTGGCACACAT
>VFG000924(gb|NP_752610) (fepB) template
GTGAGACTCGCCCCGCTCTACCGCAACGCCCTTCTATTAACAGGACTTTTGCTTTCAGGAATAGCCGCAGTTCAGGCCGCCGACTGGCCGCGTCAGATTACTGACAGCCGTGGCACTCAT

seqkit is for sorting the sequences by header and for printing sequence in a single line.

ADD REPLY
0
Entering edit mode

Thanks! Very cool use of sed!

ADD REPLY

Login before adding your answer.

Traffic: 3030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6