Converting a single line fasta transcriptome [no space between sequence and name] to conventional fasta format
1
1
Entering edit mode
5.6 years ago
Roelof ▴ 10

Hi there,

I've been handed a transcriptome containing consensus sequence data contained in a word document. For some reason this person also place each contig entry in the fasta file as a single line with no character separation between name and sequence.

The resulting file looks as follows. I've tried to solve using SeqIO in biopython, but to no avail. My Regex just isn't good enough to separate this. Later on in the file the contig names have vivid descriptions and so a pattern on the names seems to be error prone.

>rmi3_Contig1AAAATATGACAGTCTTTATTCCAGTCTATTTTAGCCAATACAGCACACACAGCTCAAGAAGTTCTTAAAATGACAGTCTAGAAATGACTACAAAGCATTTTCTTTTGCTGGAACTTTGTACAAATACAAGTAATTGTCTAAACAACTAGTTAAATATTGGCTCAAGAATCTGCCGTTCATCATAAAAAACTAGTGACAGAAAGTTGCAGTAAACCGAATAAATTTTCTGACATGCACTACCGCATCAAACGGCGCATGGATTTTATTCTGGAACAATGAGAGCATTGCACACGGCATCTGCTTCTCATGGCGTTCCTGCAAACGTTCGTTTGTGCTACAATTTCGACATTCTCATGCGTGCACTTCAAACTCTTTGTATATTATGCTTGCATATACTCTAAGGAAGCAGCATGTTCGCAGAATGATTTCTAGGAATATGTTCTGAAGGCGAGACATGACAGCGCATCGTATGCTACACACAGATCTGCTTCTGATACTGATTTACCCGGTGACCTTAGTGCTAAATCCCCTTGATATTTCTGCACTGCTCCACAAAAATGCGGGTATATTTTAGCGAGAAGTTGGATTCTGGAACAGAATTTGTGTGCACAGAAGTGTGCGTAGGAACATATGAAAAAAACTAATCTTGTGCGAATACTCGAACGAGTAATACGATATTCGAATTTGTTTCAATTAGAATTTAAATTATTGAAGATTTTGAAGAATCAAAATGAGTGAATAGCTGTGTATGAATCTGAATGTAACCTCCTGTAGAGATAGTTTGGTGGCAGTATAGAAGTATTAAGTTGTGAAAACACCTAACTAGAGGACATCTGCACTGGCACAAAACCTCGCTTCAATTTTAAATGAAATTATCACCCTCACACCGATTAATAATTTTTTTAAGTTTAAAAGCTTGTTACACCGGTTTATATGTCTGACGTATAGAAACTTTTTAAGAGTAACATACTTTACAGGTTATAACTTGCATTTTACCGAAAGTCAAATCCTGCTACTGTTCAAAGTTGTTTTCACTTCTTTTGAACCAATAAAACAAAACAAAAAAGGCACTTGCACGGGCTTTGTTTCAATTTAATAAGATAAATATTGTGCACATTCGTTAGGTGATGACAAATATGCCTATGTTTAAAATAATATTTATAATAAGTGAGATATAATATTTTATTTGATTCAAAATAATTTGACCAAAATTGCTATTCGTTTCAAATTCGCTACGAGCCTAAAATTAACTATTCGCACAAGCGTAAAAAAAAAGTTATTGTTAGATGCATATTGCTATACTGGAGATAGTAAAGCCTGCTTATAAAATATAACACATACTTCCTGGGAACAGGTATATTTTTGTGAAGATCGTGTAGCCTTATATGGTAAAATTTTGGCAAGTGCAGGCTCACATTCACGGTTATTCTTTGAAGCAAAGTAACAAATCCCAACTTGCTGAGCTTGAGCTTCTTCTTTTCTTATTGCACGTCCACATTGGGATCAGAGATGTGATTAGGGGGTTCCTAAGTTCACTTGAAAACTGTCAGGCTCTGGCCATATAATACCTGCTATGGCCAGACCTGTTATATTGTATACCTGTTATTTCCTTTGAGGCACGAAACATCAAATTAACACTAGATGTCCAATAATGCTCAATGCGCAATCTAAAAAACAAAGTTGCAAGAACACTCGCACAGCAAAACGATTTTACTAGGATACTGCGACCAATACAGTTGTCCAAATGCTTTTATACTTTTCTGATAGTTCGAGAAGCGCGCTTTAAGTAGGGAAAGGAATAACAGTAGATTGCAAGATGCGCCTATGGAATTAGGAAGTTCACTTTGCCGTATTTCTTGGTCCATTACCAGCATACCATATGTTTATAAAAAATGCATTGATAATTATGATTTTGCTTGAAAAATTATGACCTAGAAAGATGAAATGGAAGCCGCATATTGTGGGATTCCTAAGAACTGGTTTGCTACGATTCAAAGCATTCTGATGATTTAAAAAATGAAGTGAAAAAAGAATCACGAGAAGGACTAGATGCTTGAACATAATTATGATTTTCGAAACAAGATGCATGCATTGAAAACCCGTGAAATCATTTGTCATGATTTAACAATTTTTTTTCGCAAAATGAAAATGAAGCGTGGACATCAGGACGAGGAGACAAACAATATTCAGATAATATATGTCATCCGATGCTATCCCTGCGATAGATGCCGGCAACATCAGACTTCGGGGTGCCATATTTTGACTTAGTGGGAACGGCAGTTGGGCGGCATCCGGTGAATGCTGCATCATAACGAGATGTTTTTCCAGCAGAACAGGATGACCCCGATGGAGGCGAGCACGGCGGCTGCTGAAAAATCAACCTTGTTGGCGAATGATGGCACGCCCATGTAGTCTTGGACGTTGGCCTTGTCCCATCCGACTACCTCATTTTTGATGCGCTCGTCAAGCCATTTCTCGAGCGGCTCGTAGTACTTCTTTAAGGATGAAGCTGACATCTGCCGCGTGCCAGCCATGATCTCGAGGACATCGGGCCAAGGCTTAGAGCGACCCAGCGACAGTCCCTTCTTGAGGACGTCGCCAGCGTTCTTCTCTCCGTAAATGTCGCATTCATGGAATGGATGATGTTCGTCCACCTTCTTTGCAACTGTGCATAAATGCTCGTGAAACTGGAACTGAAGGATGAACGCCACGAAATATCTCAGGTACGGAACGTGCAAAGCCACGTGGTACTTAGCTCCGCCGTCAAAGAAGGACTCGTTGCGCTTTACCGGAGGTGACACGCCTTGGTATTTTATCCTGTATTCCCAGAACTTCTCGTTCATCTTGTCGAACGGCGTCTCGCCGGTGAATATGGTCCAGCGCCACTTGTCCAACAGGTACCCGAATGGCAAGAAGGCGATCTTGTCAAGTGCCGACATAAGCAGGAGGTCAACAGCATTGTATTTATCCGTTGGTTTCAGCAAGCTAAGCTTTCCGTAATGTGTTTTTGTGGCAACTGAAAGGGCTATCAGATCTCCGACGGCCTCATGGAAACCTTCGTTGGCTCCCTCTTGCAGCAGGACGTGCAGGTGCTTGTACTGCATGTAATACTCGATGTGGCCCATCTCGTGGTGGACAGTGCGCAGTTCCTCGACGCTGGGGTCGGTGCACATCTTTATTCTGAAGTCGTCGCCGTTGTACATGTTCCAGGCGGAGGCGTGACACTGAATCTCTCGGTCTTCGGGCTTTGTAAGGATGGACTTGCTCCAAAACTCGCTGGTCATGTTGTCCAGGCCCAGACTCGTAAAGAAGTCCTCCGCTGCGTGGAACATCTTTTGGGCATCCCATTTCTGTTCCACCATTGTCTTCGAGATATCCAAAGGTTTGTCTTCCATCGTTAGGTGAGGGTATAGTGTGCCCCACTCTTGCGCCCACATGTTCCCTAACAGATGGGCTGGTATCGTGCCATCTTCGGGCAGGCGTCCAGGATAAATCTCCCTGAGCTTCATTCTCACGTAGGCGTGCAGCTTTTTGTACAGCGGAGACAGATCTTCCCACAGTTTGTCGACGATCTCGGTCATGTTTTCCGTCTCGTAGTCACTGAGCCAGGCGCTCTTGATATTGTCGTAACCGTCCAGAGACGCTGCTTCATTGGACAGCTTGATGTAGGGAATGTAGTACTGTTTTATAGCCGGACCAACTGCGTTATGCCATGCCAGCCAAGTTTGCAGTAATTTATCGTAATTGCCAACTTCCTTCATATTCCGGGTGAGATCTGGTTCCAGAGGAAGGTCCTTATCCTTGCCAACGGTCACCTTGGTCGATCCATATATGGCGGCCATCTTCGAACTGAGACTGGTCGCATTCTCAAGCTTGTCGTCGGGAAGAGCTGCCAGGCCAATGGTAGCGACGTGCCTAAATAGTCGTTTGAGCGAATCATTTTTGAAATTGTGCCAGTCGAAACGCTTCGCCGTAATTCCAAACTGCCGCTCCATTTTTGAGACTTCGGTGGAGACCTTATTCGACATGTTTTGGTTGTAATCGGTGATGTTGGAAGCATAGTCCCAACTAGAAGATGAGTCCACGTTGTTAATCGTTGTATATGGGTCATTCAAGCCTTCTATAAAGGCAACGCCCATTGCTTCATCTTTTATCAAGGCTGATACATTCGATAAAGTTGCCAAGTAGGTGTCGAAGTTGTCTGCGGCTGCTGTCGCGTACAGCGCGGTGGCCAGGAGAGCGACGGCCACGAAGCGATCGGCGGCCGACGATCCCGATCGAGCAGCCATGTCGGGCCGTTTCGGTGCGGGTGAAGCTCCGCAGCTGCTCTGGTTTGTTGAGGATGTTGCGCGCGCTCTTCGCTGCTCACCGACGAGAGCGCACTCCG
>rmi3_Contig2CAAGAGCGCATCTGAGCATGCGCACTGGTATGTTTGCAACCCTCTTTACTAGGCCTAGTGCATTTTAACATGGACCCAGAGGGAAGCCGTGAAAGATCCTGAAACTATTTAATTTAGTGCAAAGTTTATTGATTTAGTGTTGTTGCGAGGTGCCTGCAGTTGGCTACAAGCACATTTAGGATCCATGGACAGTACGTCCATAATTACTCAAGTGAACAGAGAAGAGGAACAACTAACAAATTTTCCTCCTGCTGATCGAGTGCCGCCCTCCAGCAGGAAGCCCCGGCAGCGCGGCTTCCTCAACACGCTGCTCTGCTGCTTCGGGAGCAACAACCAAGGCAACAACCCCGTGATTGCCGAGGAAAATGGCCAGTACTCGCCCAAGCTCCAGGGCAAGTACCTGCTGCCACCCGTGCGGCATCAGGATATTCGCAAGATATGCCTCATCATTGACCTCGATGAGACATTGGTCCATAGCTCATTCAAGCCCATCAACAATGCTGACTTCGTGGTGCCTGTAGAGATAGATGGCACGGTGCACCAGGTGTATGTCTTGAAGAGGCCTTATGTGGACGAGTTCTTGCAGCGAGTTGGCGATGCCTACGAATGTGTCTTATTCACAGCCAGCCTTGCAAAGTATGCTGACCCTGTGGCTGACCTGCTGGACAAGTGGGGTGTCTTCCGGGCACGGTTATTCCGAGAGTCTTGTGTCTTCTACCGAGGAAACTATGTCAAGGACCTTGGTAGGCTGGGCCGGGATCTGCGCAGAGTGGTCATAATAGACAACTCACCGGCCTCGTACATCTTCCATCCTGACAATGCAGTACCTGTCAACTCGTGGTTCGATGACATGTCAGACACGGAGCTGCGGGACCTGATGCCATTGTTCGACGAACTGAGCCGTGTCGAGGACGTGTACACGGTGCTGCGCAACTCCAACAACGCGGCTGGCGGTGGCGGCGGCTCTCCGGCCTTCCCTGCACCACTCCTGATGAACGGCAGCGCGGTGGCTTTGCACAACAGCGGTTCCTAGCATTCCGCACAGTGCGGCTTGTGCAATAGCCCCTTCTCCGCCGGCAGTACAAAAGCGCTTACGGGTCCCGTGCTAGTCTCGCCGGCCTACTTAACGTCGGAGGGGGGCTGCCCCTTGTGCCTTGTCTCTTCCGCTCTGGACGAGAGTTTGTATAATAACGGTGTTCCATAATCTCGCCTGTATCATAGATTAAAGACGACTATTTCAGCCTGCAAAA

Anyone know how to convert this into the following

>rmi3_Contig1
AAAATATGACAGTCTTTATTCCAGTCTATTTTAGCCAATACAGCACACACAGCTCAAGAAGTTCTTAAAATGACAGTCTAGAAATGACTACAAAGCATTTTCTTTTGCTGGAACTTTGTACAAATACAAGTAATTGTCTAAACAACTAGTTAAATATTGGCTCAAGAATCTGCCGTTCATCATAAAAAACTAGTGACAGAAAGTTGCAGTAAACCGAATAAATTTTCTGACATGCACTACCGCATCAAACGGCGCATGGATTTTATTCTGGAACAATGAGAGCATTGCACACGGCATCTGCTTCTCATGGCGTTCCTGCAAACGTTCGTTTGTGCTACAATTTCGACATTCTCATGCGTGCACTTCAAACTCTTTGTATATTATGCTTGCATATACTCTAAGGAAGCAGCATGTTCGCAGAATGATTTCTAGGAATATGTTCTGAAGGCGAGACATGACAGCGCATCGTATGCTACACACAGATCTGCTTCTGATACTGATTTACCCGGTGACCTTAGTGCTAAATCCCCTTGATATTTCTGCACTGCTCCACAAAAATGCGGGTATATTTTAGCGAGAAGTTGGATTCTGGAACAGAATTTGTGTGCACAGAAGTGTGCGTAGGAACATATGAAAAAAACTAATCTTGTGCGAATACTCGAACGAGTAATACGATATTCGAATTTGTTTCAATTAGAATTTAAATTATTGAAGATTTTGAAGAATCAAAATGAGTGAATAGCTGTGTATGAATCTGAATGTAACCTCCTGTAGAGATAGTTTGGTGGCAGTATAGAAGTATTAAGTTGTGAAAACACCTAACTAGAGGACATCTGCACTGGCACAAAACCTCGCTTCAATTTTAAATGAAATTATCACCCTCACACCGATTAATAATTTTTTTAAGTTTAAAAGCTTGTTACACCGGTTTATATGTCTGACGTATAGAAACTTTTTAAGAGTAACATACTTTACAGGTTATAACTTGCATTTTACCGAAAGTCAAATCCTGCTACTGTTCAAAGTTGTTTTCACTTCTTTTGAACCAATAAAACAAAACAAAAAAGGCACTTGCACGGGCTTTGTTTCAATTTAATAAGATAAATATTGTGCACATTCGTTAGGTGATGACAAATATGCCTATGTTTAAAATAATATTTATAATAAGTGAGATATAATATTTTATTTGATTCAAAATAATTTGACCAAAATTGCTATTCGTTTCAAATTCGCTACGAGCCTAAAATTAACTATTCGCACAAGCGTAAAAAAAAAGTTATTGTTAGATGCATATTGCTATACTGGAGATAGTAAAGCCTGCTTATAAAATATAACACATACTTCCTGGGAACAGGTATATTTTTGTGAAGATCGTGTAGCCTTATATGGTAAAATTTTGGCAAGTGCAGGCTCACATTCACGGTTATTCTTTGAAGCAAAGTAACAAATCCCAACTTGCTGAGCTTGAGCTTCTTCTTTTCTTATTGCACGTCCACATTGGGATCAGAGATGTGATTAGGGGGTTCCTAAGTTCACTTGAAAACTGTCAGGCTCTGGCCATATAATACCTGCTATGGCCAGACCTGTTATATTGTATACCTGTTATTTCCTTTGAGGCACGAAACATCAAATTAACACTAGATGTCCAATAATGCTCAATGCGCAATCTAAAAAACAAAGTTGCAAGAACACTCGCACAGCAAAACGATTTTACTAGGATACTGCGACCAATACAGTTGTCCAAATGCTTTTATACTTTTCTGATAGTTCGAGAAGCGCGCTTTAAGTAGGGAAAGGAATAACAGTAGATTGCAAGATGCGCCTATGGAATTAGGAAGTTCACTTTGCCGTATTTCTTGGTCCATTACCAGCATACCATATGTTTATAAAAAATGCATTGATAATTATGATTTTGCTTGAAAAATTATGACCTAGAAAGATGAAATGGAAGCCGCATATTGTGGGATTCCTAAGAACTGGTTTGCTACGATTCAAAGCATTCTGATGATTTAAAAAATGAAGTGAAAAAAGAATCACGAGAAGGACTAGATGCTTGAACATAATTATGATTTTCGAAACAAGATGCATGCATTGAAAACCCGTGAAATCATTTGTCATGATTTAACAATTTTTTTTCGCAAAATGAAAATGAAGCGTGGACATCAGGACGAGGAGACAAACAATATTCAGATAATATATGTCATCCGATGCTATCCCTGCGATAGATGCCGGCAACATCAGACTTCGGGGTGCCATATTTTGACTTAGTGGGAACGGCAGTTGGGCGGCATCCGGTGAATGCTGCATCATAACGAGATGTTTTTCCAGCAGAACAGGATGACCCCGATGGAGGCGAGCACGGCGGCTGCTGAAAAATCAACCTTGTTGGCGAATGATGGCACGCCCATGTAGTCTTGGACGTTGGCCTTGTCCCATCCGACTACCTCATTTTTGATGCGCTCGTCAAGCCATTTCTCGAGCGGCTCGTAGTACTTCTTTAAGGATGAAGCTGACATCTGCCGCGTGCCAGCCATGATCTCGAGGACATCGGGCCAAGGCTTAGAGCGACCCAGCGACAGTCCCTTCTTGAGGACGTCGCCAGCGTTCTTCTCTCCGTAAATGTCGCATTCATGGAATGGATGATGTTCGTCCACCTTCTTTGCAACTGTGCATAAATGCTCGTGAAACTGGAACTGAAGGATGAACGCCACGAAATATCTCAGGTACGGAACGTGCAAAGCCACGTGGTACTTAGCTCCGCCGTCAAAGAAGGACTCGTTGCGCTTTACCGGAGGTGACACGCCTTGGTATTTTATCCTGTATTCCCAGAACTTCTCGTTCATCTTGTCGAACGGCGTCTCGCCGGTGAATATGGTCCAGCGCCACTTGTCCAACAGGTACCCGAATGGCAAGAAGGCGATCTTGTCAAGTGCCGACATAAGCAGGAGGTCAACAGCATTGTATTTATCCGTTGGTTTCAGCAAGCTAAGCTTTCCGTAATGTGTTTTTGTGGCAACTGAAAGGGCTATCAGATCTCCGACGGCCTCATGGAAACCTTCGTTGGCTCCCTCTTGCAGCAGGACGTGCAGGTGCTTGTACTGCATGTAATACTCGATGTGGCCCATCTCGTGGTGGACAGTGCGCAGTTCCTCGACGCTGGGGTCGGTGCACATCTTTATTCTGAAGTCGTCGCCGTTGTACATGTTCCAGGCGGAGGCGTGACACTGAATCTCTCGGTCTTCGGGCTTTGTAAGGATGGACTTGCTCCAAAACTCGCTGGTCATGTTGTCCAGGCCCAGACTCGTAAAGAAGTCCTCCGCTGCGTGGAACATCTTTTGGGCATCCCATTTCTGTTCCACCATTGTCTTCGAGATATCCAAAGGTTTGTCTTCCATCGTTAGGTGAGGGTATAGTGTGCCCCACTCTTGCGCCCACATGTTCCCTAACAGATGGGCTGGTATCGTGCCATCTTCGGGCAGGCGTCCAGGATAAATCTCCCTGAGCTTCATTCTCACGTAGGCGTGCAGCTTTTTGTACAGCGGAGACAGATCTTCCCACAGTTTGTCGACGATCTCGGTCATGTTTTCCGTCTCGTAGTCACTGAGCCAGGCGCTCTTGATATTGTCGTAACCGTCCAGAGACGCTGCTTCATTGGACAGCTTGATGTAGGGAATGTAGTACTGTTTTATAGCCGGACCAACTGCGTTATGCCATGCCAGCCAAGTTTGCAGTAATTTATCGTAATTGCCAACTTCCTTCATATTCCGGGTGAGATCTGGTTCCAGAGGAAGGTCCTTATCCTTGCCAACGGTCACCTTGGTCGATCCATATATGGCGGCCATCTTCGAACTGAGACTGGTCGCATTCTCAAGCTTGTCGTCGGGAAGAGCTGCCAGGCCAATGGTAGCGACGTGCCTAAATAGTCGTTTGAGCGAATCATTTTTGAAATTGTGCCAGTCGAAACGCTTCGCCGTAATTCCAAACTGCCGCTCCATTTTTGAGACTTCGGTGGAGACCTTATTCGACATGTTTTGGTTGTAATCGGTGATGTTGGAAGCATAGTCCCAACTAGAAGATGAGTCCACGTTGTTAATCGTTGTATATGGGTCATTCAAGCCTTCTATAAAGGCAACGCCCATTGCTTCATCTTTTATCAAGGCTGATACATTCGATAAAGTTGCCAAGTAGGTGTCGAAGTTGTCTGCGGCTGCTGTCGCGTACAGCGCGGTGGCCAGGAGAGCGACGGCCACGAAGCGATCGGCGGCCGACGATCCCGATCGAGCAGCCATGTCGGGCCGTTTCGGTGCGGGTGAAGCTCCGCAGCTGCTCTGGTTTGTTGAGGATGTTGCGCGCGCTCTTCGCTGCTCACCGACGAGAGCGCACTCCG
>rmi3_Contig2
CAAGAGCGCATCTGAGCATGCGCACTGGTATGTTTGCAACCCTCTTTACTAGGCCTAGTGCATTTTAACATGGACCCAGAGGGAAGCCGTGAAAGATCCTGAAACTATTTAATTTAGTGCAAAGTTTATTGATTTAGTGTTGTTGCGAGGTGCCTGCAGTTGGCTACAAGCACATTTAGGATCCATGGACAGTACGTCCATAATTACTCAAGTGAACAGAGAAGAGGAACAACTAACAAATTTTCCTCCTGCTGATCGAGTGCCGCCCTCCAGCAGGAAGCCCCGGCAGCGCGGCTTCCTCAACACGCTGCTCTGCTGCTTCGGGAGCAACAACCAAGGCAACAACCCCGTGATTGCCGAGGAAAATGGCCAGTACTCGCCCAAGCTCCAGGGCAAGTACCTGCTGCCACCCGTGCGGCATCAGGATATTCGCAAGATATGCCTCATCATTGACCTCGATGAGACATTGGTCCATAGCTCATTCAAGCCCATCAACAATGCTGACTTCGTGGTGCCTGTAGAGATAGATGGCACGGTGCACCAGGTGTATGTCTTGAAGAGGCCTTATGTGGACGAGTTCTTGCAGCGAGTTGGCGATGCCTACGAATGTGTCTTATTCACAGCCAGCCTTGCAAAGTATGCTGACCCTGTGGCTGACCTGCTGGACAAGTGGGGTGTCTTCCGGGCACGGTTATTCCGAGAGTCTTGTGTCTTCTACCGAGGAAACTATGTCAAGGACCTTGGTAGGCTGGGCCGGGATCTGCGCAGAGTGGTCATAATAGACAACTCACCGGCCTCGTACATCTTCCATCCTGACAATGCAGTACCTGTCAACTCGTGGTTCGATGACATGTCAGACACGGAGCTGCGGGACCTGATGCCATTGTTCGACGAACTGAGCCGTGTCGAGGACGTGTACACGGTGCTGCGCAACTCCAACAACGCGGCTGGCGGTGGCGGCGGCTCTCCGGCCTTCCCTGCACCACTCCTGATGAACGGCAGCGCGGTGGCTTTGCACAACAGCGGTTCCTAGCATTCCGCACAGTGCGGCTTGTGCAATAGCCCCTTCTCCGCCGGCAGTACAAAAGCGCTTACGGGTCCCGTGCTAGTCTCGCCGGCCTACTTAACGTCGGAGGGGGGCTGCCCCTTGTGCCTTGTCTCTTCCGCTCTGGACGAGAGTTTGTATAATAACGGTGTTCCATAATCTCGCCTGTATCATAGATTAAAGACGACTATTTCAGCCTGCAAAA

Hopelessly stuck R

genome RNA-Seq next-gen sequence fasta • 810 views
ADD COMMENT
1
Entering edit mode

try:

   $ sed 's/\(.*[0-9]\+\)/\1\n/g' test.fa > new.fa

or

$ sed 's/\(.*Contig[0-9]\+\)/\1\n/g' test.fa
ADD REPLY
0
Entering edit mode

with bash: Regex works only if there is 6 letter word (contig in this example), preceded by _ (underscore) and followed by a single number (contig number in this example) (probably works with OP data only):

$ grep -Po '\w*\W\w*[0-9]+(?=[ATGCN]*)|(?<=_\w{6}[0-9]).*' test.fa

>rmi3_Contig1
AAAATATGACAGTCTTTATTCCAGTCTATTTTAGCCAATACAGCACACACAGCTCAAGAAGTTCTTAAAATGACAGTCTAGAAATGACTACAAAGCATTTTCTTTTGCTGGAACTTTGTACAAATACAAGTAATTGTCTAAACAACTAGTTAAATATTGGCTCAAGAATCTGCCGTTCAT............................................................GCTGATACATTCGATAAAGTTGCCAAGTAGGTGTCGAAGTTGTCTGCGGCTGCTGTCGCGTACAGCGCGGTGGCCAGGAGAGCGACGGCCACGAAGCGATCGGCGGCCGACGATCCCGATCGAGCAGCCATGTCGGGCCGTTTCGGTGCGGGTGAAGCTCCGCAGCTGCTCTGGTTTGTTGAGGATGTTGCGCGCGCTCTTCGCTGCTCACCGACGAGAGCGCACTCCG
>rmi3_Contig2
CAAGAGCGCATCTGAGCATGCGCACTGGTATGTTTGCAACCCTCTTTACTAGGCCTAGTGCATTTTAACATGGACCCAGAGGGAAGCCGTGAAAGATCCTGAAACTATTTAATTTAGTGCAAAGTTTATTGATTTAGTGTTGTTGCGAGGTGCCTGCAGTTGGCTACAAGCACATTTAGGATCCATGGACAGTACGTCCATAATTACTCAAGTGAACAGAGAAGAGGAACAACTAACAAATTTTCCTCCTGCTGATCGAGTGCCGCCCTCCAGCAGGAAGCCCCGGCAGCGCGGCTTCCTCAACACGCTGCTCTGCTGCTTCGGGAGCAACAACCAAGGCAACAACCCCGTGATTGCCGAGGAAAATGGCCAGTACTCGCCCAAGCTCCAGGGCAAGTACCTGCTGCCACCCGTGCGGCATCAGGATATTCGCAAGATATGCCTCATCATTGACCTCGATGAGACATTGGTCCATAGCTCATTCAAGCCCATCAACAATGCTGACTTCGTGGTGCCTGTAGAGATAGATGGCACGGTGCACCAGGTGTATGTCTTGAAGAGGCCTTATGTGGACGAGTTCTTGCAGCGAGTTGGCGATGCCTACGAATGTGTCTTATTCACAGCCAGCCTTGCAAAGTATGCTGACCCTGTGGCTGACCTGCTGGACAAGTGGGGTGTCTTCCGGGCACGGTTATTCCGAGAGTCTTGTGTCTTCTACCGAGGAAACTATGTCAAGGACCTTGGTAGGCTGGG

ps: Removed part of contig 1 sequence for 5000 character limit.

ADD REPLY
1
Entering edit mode
5.6 years ago
 sed 's/\([ATGCNatgc]*\)$/ \1/'  input.txt | tr " " "\n"  > out.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6