How can I convert this pairwise format to fasta?
1
0
Entering edit mode
14 months ago
SaltedPork ▴ 170

Hi, Not sure if this is a known file type in bioinformatics or not. But how could I go about changing this to fasta format?

M03972:384:000000000-KDHHW:1:1101:8022:20849    83  Consensus_7-INT.v2_threshold_0_quality_20   4766    42  153M    =   4766    -153
GCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAACTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGGC
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |
GCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAACTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGAC

M03972:384:000000000-KDHHW:1:1101:8022:20849    163 Consensus_7-INT.v2_threshold_0_quality_20   4766    42  128M    =   4766    153
GCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAACTACAAAAACAAATTACAAAAATTCA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAACTACAAAAACAAATTACAAAAATTCA

Each entry in the file is 4 lines. lines 1 and 3 are IDs, lines 2 and 4 would be sequence. Has anyone seen this before?

fasta python bash • 648 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
3
Entering edit mode
14 months ago
GenoMax 141k

This seems to be the output from sam2pairwise program: https://github.com/mlafave/sam2pairwise

A quick/dirty way may be

$ grep -A 1 ^M039 file.fa --no-group-separator | sed 's/^M03972/>M03972/g' | cut -f1 -d ' '
>M03972:384:000000000-KDHHW:1:1101:8022:20849
GCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAACTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGGC
>M03972:384:000000000-KDHHW:1:1101:8022:20849
GCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAACTACAAAAACAAATTACAAAAATTCA
ADD COMMENT

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6