Question: Converting a single line fasta transcriptome [no space between sequence and name] to conventional fasta format
1
gravatar for Roelof
13 months ago by
Roelof10
Pretoria, South Africa
Roelof10 wrote:

Hi there,

I've been handed a transcriptome containing consensus sequence data contained in a word document. For some reason this person also place each contig entry in the fasta file as a single line with no character separation between name and sequence.

The resulting file looks as follows. I've tried to solve using SeqIO in biopython, but to no avail. My Regex just isn't good enough to separate this. Later on in the file the contig names have vivid descriptions and so a pattern on the names seems to be error prone.

>rmi3_Contig1AAAATATGACAGTCTTTATTCCAGTCTATTTTAGCCAATACAGCACACACAGCTCAAGAAGTTCTTAAAATGACAGTCTAGAAATGACTACAAAGCATTTTCTTTTGCTGGAACTTTGTACAAATACAAGTAATTGTCTAAACAACTAGTTAAATATTGGCTCAAGAATCTGCCGTTCATCATAAAAAACTAGTGACAGAAAGTTGCAGTAAACCGAATAAATTTTCTGACATGCACTACCGCATCAAACGGCGCATGGATTTTATTCTGGAACAATGAGAGCATTGCACACGGCATCTGCTTCTCATGGCGTTCCTGCAAACGTTCGTTTGTGCTACAATTTCGACATTCTCATGCGTGCACTTCAAACTCTTTGTATATTATGCTTGCATATACTCTAAGGAAGCAGCATGTTCGCAGAATGATTTCTAGGAATATGTTCTGAAGGCGAGACATGACAGCGCATCGTATGCTACACACAGATCTGCTTCTGATACTGATTTACCCGGTGACCTTAGTGCTAAATCCCCTTGATATTTCTGCACTGCTCCACAAAAATGCGGGTATATTTTAGCGAGAAGTTGGATTCTGGAACAGAATTTGTGTGCACAGAAGTGTGCGTAGGAACATATGAAAAAAACTAATCTTGTGCGAATACTCGAACGAGTAATACGATATTCGAATTTGTTTCAATTAGAATTTAAATTATTGAAGATTTTGAAGAATCAAAATGAGTGAATAGCTGTGTATGAATCTGAATGTAACCTCCTGTAGAGATAGTTTGGTGGCAGTATAGAAGTATTAAGTTGTGAAAACACCTAACTAGAGGACATCTGCACTGGCACAAAACCTCGCTTCAATTTTAAATGAAATTATCACCCTCACACCGATTAATAATTTTTTTAAGTTTAAAAGCTTGTTACACCGGTTTATATGTCTGACGTATAGAAACTTTTTAAGAGTAACATACTTTACAGGTTATAACTTGCATTTTACCGAAAGTCAAATCCTGCTACTGTTCAAAGTTGTTTTCACTTCTTTTGAACCAATAAAACAAAACAAAAAAGGCACTTGCACGGGCTTTGTTTCAATTTAATAAGATAAATATTGTGCACATTCGTTAGGTGATGACAAATATGCCTATGTTTAAAATAATATTTATAATAAGTGAGATATAATATTTTATTTGATTCAAAATAATTTGACCAAAATTGCTATTCGTTTCAAATTCGCTACGAGCCTAAAATTAACTATTCGCACAAGCGTAAAAAAAAAGTTATTGTTAGATGCATATTGCTATACTGGAGATAGTAAAGCCTGCTTATAAAATATAACACATACTTCCTGGGAACAGGTATATTTTTGTGAAGATCGTGTAGCCTTATATGGTAAAATTTTGGCAAGTGCAGGCTCACATTCACGGTTATTCTTTGAAGCAAAGTAACAAATCCCAACTTGCTGAGCTTGAGCTTCTTCTTTTCTTATTGCACGTCCACATTGGGATCAGAGATGTGATTAGGGGGTTCCTAAGTTCACTTGAAAACTGTCAGGCTCTGGCCATATAATACCTGCTATGGCCAGACCTGTTATATTGTATACCTGTTATTTCCTTTGAGGCACGAAACATCAAATTAACACTAGATGTCCAATAATGCTCAATGCGCAATCTAAAAAACAAAGTTGCAAGAACACTCGCACAGCAAAACGATTTTACTAGGATACTGCGACCAATACAGTTGTCCAAATGCTTTTATACTTTTCTGATAGTTCGAGAAGCGCGCTTTAAGTAGGGAAAGGAATAACAGTAGATTGCAAGATGCGCCTATGGAATTAGGAAGTTCACTTTGCCGTATTTCTTGGTCCATTACCAGCATACCATATGTTTATAAAAAATGCATTGATAATTATGATTTTGCTTGAAAAATTATGACCTAGAAAGATGAAATGGAAGCCGCATATTGTGGGATTCCTAAGAACTGGTTTGCTACGATTCAAAGCATTCTGATGATTTAAAAAATGAAGTGAAAAAAGAATCACGAGAAGGACTAGATGCTTGAACATAATTATGATTTTCGAAACAAGATGCATGCATTGAAAACCCGTGAAATCATTTGTCATGATTTAACAATTTTTTTTCGCAAAATGAAAATGAAGCGTGGACATCAGGACGAGGAGACAAACAATATTCAGATAATATATGTCATCCGATGCTATCCCTGCGATAGATGCCGGCAACATCAGACTTCGGGGTGCCATATTTTGACTTAGTGGGAACGGCAGTTGGGCGGCATCCGGTGAATGCTGCATCATAACGAGATGTTTTTCCAGCAGAACAGGATGACCCCGATGGAGGCGAGCACGGCGGCTGCTGAAAAATCAACCTTGTTGGCGAATGATGGCACGCCCATGTAGTCTTGGACGTTGGCCTTGTCCCATCCGACTACCTCATTTTTGATGCGCTCGTCAAGCCATTTCTCGAGCGGCTCGTAGTACTTCTTTAAGGATGAAGCTGACATCTGCCGCGTGCCAGCCATGATCTCGAGGACATCGGGCCAAGGCTTAGAGCGACCCAGCGACAGTCCCTTCTTGAGGACGTCGCCAGCGTTCTTCTCTCCGTAAATGTCGCATTCATGGAATGGATGATGTTCGTCCACCTTCTTTGCAACTGTGCATAAATGCTCGTGAAACTGGAACTGAAGGATGAACGCCACGAAATATCTCAGGTACGGAACGTGCAAAGCCACGTGGTACTTAGCTCCGCCGTCAAAGAAGGACTCGTTGCGCTTTACCGGAGGTGACACGCCTTGGTATTTTATCCTGTATTCCCAGAACTTCTCGTTCATCTTGTCGAACGGCGTCTCGCCGGTGAATATGGTCCAGCGCCACTTGTCCAACAGGTACCCGAATGGCAAGAAGGCGATCTTGTCAAGTGCCGACATAAGCAGGAGGTCAACAGCATTGTATTTATCCGTTGGTTTCAGCAAGCTAAGCTTTCCGTAATGTGTTTTTGTGGCAACTGAAAGGGCTATCAGATCTCCGACGGCCTCATGGAAACCTTCGTTGGCTCCCTCTTGCAGCAGGACGTGCAGGTGCTTGTACTGCATGTAATACTCGATGTGGCCCATCTCGTGGTGGACAGTGCGCAGTTCCTCGACGCTGGGGTCGGTGCACATCTTTATTCTGAAGTCGTCGCCGTTGTACATGTTCCAGGCGGAGGCGTGACACTGAATCTCTCGGTCTTCGGGCTTTGTAAGGATGGACTTGCTCCAAAACTCGCTGGTCATGTTGTCCAGGCCCAGACTCGTAAAGAAGTCCTCCGCTGCGTGGAACATCTTTTGGGCATCCCATTTCTGTTCCACCATTGTCTTCGAGATATCCAAAGGTTTGTCTTCCATCGTTAGGTGAGGGTATAGTGTGCCCCACTCTTGCGCCCACATGTTCCCTAACAGATGGGCTGGTATCGTGCCATCTTCGGGCAGGCGTCCAGGATAAATCTCCCTGAGCTTCATTCTCACGTAGGCGTGCAGCTTTTTGTACAGCGGAGACAGATCTTCCCACAGTTTGTCGACGATCTCGGTCATGTTTTCCGTCTCGTAGTCACTGAGCCAGGCGCTCTTGATATTGTCGTAACCGTCCAGAGACGCTGCTTCATTGGACAGCTTGATGTAGGGAATGTAGTACTGTTTTATAGCCGGACCAACTGCGTTATGCCATGCCAGCCAAGTTTGCAGTAATTTATCGTAATTGCCAACTTCCTTCATATTCCGGGTGAGATCTGGTTCCAGAGGAAGGTCCTTATCCTTGCCAACGGTCACCTTGGTCGATCCATATATGGCGGCCATCTTCGAACTGAGACTGGTCGCATTCTCAAGCTTGTCGTCGGGAAGAGCTGCCAGGCCAATGGTAGCGACGTGCCTAAATAGTCGTTTGAGCGAATCATTTTTGAAATTGTGCCAGTCGAAACGCTTCGCCGTAATTCCAAACTGCCGCTCCATTTTTGAGACTTCGGTGGAGACCTTATTCGACATGTTTTGGTTGTAATCGGTGATGTTGGAAGCATAGTCCCAACTAGAAGATGAGTCCACGTTGTTAATCGTTGTATATGGGTCATTCAAGCCTTCTATAAAGGCAACGCCCATTGCTTCATCTTTTATCAAGGCTGATACATTCGATAAAGTTGCCAAGTAGGTGTCGAAGTTGTCTGCGGCTGCTGTCGCGTACAGCGCGGTGGCCAGGAGAGCGACGGCCACGAAGCGATCGGCGGCCGACGATCCCGATCGAGCAGCCATGTCGGGCCGTTTCGGTGCGGGTGAAGCTCCGCAGCTGCTCTGGTTTGTTGAGGATGTTGCGCGCGCTCTTCGCTGCTCACCGACGAGAGCGCACTCCG
>rmi3_Contig2CAAGAGCGCATCTGAGCATGCGCACTGGTATGTTTGCAACCCTCTTTACTAGGCCTAGTGCATTTTAACATGGACCCAGAGGGAAGCCGTGAAAGATCCTGAAACTATTTAATTTAGTGCAAAGTTTATTGATTTAGTGTTGTTGCGAGGTGCCTGCAGTTGGCTACAAGCACATTTAGGATCCATGGACAGTACGTCCATAATTACTCAAGTGAACAGAGAAGAGGAACAACTAACAAATTTTCCTCCTGCTGATCGAGTGCCGCCCTCCAGCAGGAAGCCCCGGCAGCGCGGCTTCCTCAACACGCTGCTCTGCTGCTTCGGGAGCAACAACCAAGGCAACAACCCCGTGATTGCCGAGGAAAATGGCCAGTACTCGCCCAAGCTCCAGGGCAAGTACCTGCTGCCACCCGTGCGGCATCAGGATATTCGCAAGATATGCCTCATCATTGACCTCGATGAGACATTGGTCCATAGCTCATTCAAGCCCATCAACAATGCTGACTTCGTGGTGCCTGTAGAGATAGATGGCACGGTGCACCAGGTGTATGTCTTGAAGAGGCCTTATGTGGACGAGTTCTTGCAGCGAGTTGGCGATGCCTACGAATGTGTCTTATTCACAGCCAGCCTTGCAAAGTATGCTGACCCTGTGGCTGACCTGCTGGACAAGTGGGGTGTCTTCCGGGCACGGTTATTCCGAGAGTCTTGTGTCTTCTACCGAGGAAACTATGTCAAGGACCTTGGTAGGCTGGGCCGGGATCTGCGCAGAGTGGTCATAATAGACAACTCACCGGCCTCGTACATCTTCCATCCTGACAATGCAGTACCTGTCAACTCGTGGTTCGATGACATGTCAGACACGGAGCTGCGGGACCTGATGCCATTGTTCGACGAACTGAGCCGTGTCGAGGACGTGTACACGGTGCTGCGCAACTCCAACAACGCGGCTGGCGGTGGCGGCGGCTCTCCGGCCTTCCCTGCACCACTCCTGATGAACGGCAGCGCGGTGGCTTTGCACAACAGCGGTTCCTAGCATTCCGCACAGTGCGGCTTGTGCAATAGCCCCTTCTCCGCCGGCAGTACAAAAGCGCTTACGGGTCCCGTGCTAGTCTCGCCGGCCTACTTAACGTCGGAGGGGGGCTGCCCCTTGTGCCTTGTCTCTTCCGCTCTGGACGAGAGTTTGTATAATAACGGTGTTCCATAATCTCGCCTGTATCATAGATTAAAGACGACTATTTCAGCCTGCAAAA

Anyone know how to convert this into the following

>rmi3_Contig1
AAAATATGACAGTCTTTATTCCAGTCTATTTTAGCCAATACAGCACACACAGCTCAAGAAGTTCTTAAAATGACAGTCTAGAAATGACTACAAAGCATTTTCTTTTGCTGGAACTTTGTACAAATACAAGTAATTGTCTAAACAACTAGTTAAATATTGGCTCAAGAATCTGCCGTTCATCATAAAAAACTAGTGACAGAAAGTTGCAGTAAACCGAATAAATTTTCTGACATGCACTACCGCATCAAACGGCGCATGGATTTTATTCTGGAACAATGAGAGCATTGCACACGGCATCTGCTTCTCATGGCGTTCCTGCAAACGTTCGTTTGTGCTACAATTTCGACATTCTCATGCGTGCACTTCAAACTCTTTGTATATTATGCTTGCATATACTCTAAGGAAGCAGCATGTTCGCAGAATGATTTCTAGGAATATGTTCTGAAGGCGAGACATGACAGCGCATCGTATGCTACACACAGATCTGCTTCTGATACTGATTTACCCGGTGACCTTAGTGCTAAATCCCCTTGATATTTCTGCACTGCTCCACAAAAATGCGGGTATATTTTAGCGAGAAGTTGGATTCTGGAACAGAATTTGTGTGCACAGAAGTGTGCGTAGGAACATATGAAAAAAACTAATCTTGTGCGAATACTCGAACGAGTAATACGATATTCGAATTTGTTTCAATTAGAATTTAAATTATTGAAGATTTTGAAGAATCAAAATGAGTGAATAGCTGTGTATGAATCTGAATGTAACCTCCTGTAGAGATAGTTTGGTGGCAGTATAGAAGTATTAAGTTGTGAAAACACCTAACTAGAGGACATCTGCACTGGCACAAAACCTCGCTTCAATTTTAAATGAAATTATCACCCTCACACCGATTAATAATTTTTTTAAGTTTAAAAGCTTGTTACACCGGTTTATATGTCTGACGTATAGAAACTTTTTAAGAGTAACATACTTTACAGGTTATAACTTGCATTTTACCGAAAGTCAAATCCTGCTACTGTTCAAAGTTGTTTTCACTTCTTTTGAACCAATAAAACAAAACAAAAAAGGCACTTGCACGGGCTTTGTTTCAATTTAATAAGATAAATATTGTGCACATTCGTTAGGTGATGACAAATATGCCTATGTTTAAAATAATATTTATAATAAGTGAGATATAATATTTTATTTGATTCAAAATAATTTGACCAAAATTGCTATTCGTTTCAAATTCGCTACGAGCCTAAAATTAACTATTCGCACAAGCGTAAAAAAAAAGTTATTGTTAGATGCATATTGCTATACTGGAGATAGTAAAGCCTGCTTATAAAATATAACACATACTTCCTGGGAACAGGTATATTTTTGTGAAGATCGTGTAGCCTTATATGGTAAAATTTTGGCAAGTGCAGGCTCACATTCACGGTTATTCTTTGAAGCAAAGTAACAAATCCCAACTTGCTGAGCTTGAGCTTCTTCTTTTCTTATTGCACGTCCACATTGGGATCAGAGATGTGATTAGGGGGTTCCTAAGTTCACTTGAAAACTGTCAGGCTCTGGCCATATAATACCTGCTATGGCCAGACCTGTTATATTGTATACCTGTTATTTCCTTTGAGGCACGAAACATCAAATTAACACTAGATGTCCAATAATGCTCAATGCGCAATCTAAAAAACAAAGTTGCAAGAACACTCGCACAGCAAAACGATTTTACTAGGATACTGCGACCAATACAGTTGTCCAAATGCTTTTATACTTTTCTGATAGTTCGAGAAGCGCGCTTTAAGTAGGGAAAGGAATAACAGTAGATTGCAAGATGCGCCTATGGAATTAGGAAGTTCACTTTGCCGTATTTCTTGGTCCATTACCAGCATACCATATGTTTATAAAAAATGCATTGATAATTATGATTTTGCTTGAAAAATTATGACCTAGAAAGATGAAATGGAAGCCGCATATTGTGGGATTCCTAAGAACTGGTTTGCTACGATTCAAAGCATTCTGATGATTTAAAAAATGAAGTGAAAAAAGAATCACGAGAAGGACTAGATGCTTGAACATAATTATGATTTTCGAAACAAGATGCATGCATTGAAAACCCGTGAAATCATTTGTCATGATTTAACAATTTTTTTTCGCAAAATGAAAATGAAGCGTGGACATCAGGACGAGGAGACAAACAATATTCAGATAATATATGTCATCCGATGCTATCCCTGCGATAGATGCCGGCAACATCAGACTTCGGGGTGCCATATTTTGACTTAGTGGGAACGGCAGTTGGGCGGCATCCGGTGAATGCTGCATCATAACGAGATGTTTTTCCAGCAGAACAGGATGACCCCGATGGAGGCGAGCACGGCGGCTGCTGAAAAATCAACCTTGTTGGCGAATGATGGCACGCCCATGTAGTCTTGGACGTTGGCCTTGTCCCATCCGACTACCTCATTTTTGATGCGCTCGTCAAGCCATTTCTCGAGCGGCTCGTAGTACTTCTTTAAGGATGAAGCTGACATCTGCCGCGTGCCAGCCATGATCTCGAGGACATCGGGCCAAGGCTTAGAGCGACCCAGCGACAGTCCCTTCTTGAGGACGTCGCCAGCGTTCTTCTCTCCGTAAATGTCGCATTCATGGAATGGATGATGTTCGTCCACCTTCTTTGCAACTGTGCATAAATGCTCGTGAAACTGGAACTGAAGGATGAACGCCACGAAATATCTCAGGTACGGAACGTGCAAAGCCACGTGGTACTTAGCTCCGCCGTCAAAGAAGGACTCGTTGCGCTTTACCGGAGGTGACACGCCTTGGTATTTTATCCTGTATTCCCAGAACTTCTCGTTCATCTTGTCGAACGGCGTCTCGCCGGTGAATATGGTCCAGCGCCACTTGTCCAACAGGTACCCGAATGGCAAGAAGGCGATCTTGTCAAGTGCCGACATAAGCAGGAGGTCAACAGCATTGTATTTATCCGTTGGTTTCAGCAAGCTAAGCTTTCCGTAATGTGTTTTTGTGGCAACTGAAAGGGCTATCAGATCTCCGACGGCCTCATGGAAACCTTCGTTGGCTCCCTCTTGCAGCAGGACGTGCAGGTGCTTGTACTGCATGTAATACTCGATGTGGCCCATCTCGTGGTGGACAGTGCGCAGTTCCTCGACGCTGGGGTCGGTGCACATCTTTATTCTGAAGTCGTCGCCGTTGTACATGTTCCAGGCGGAGGCGTGACACTGAATCTCTCGGTCTTCGGGCTTTGTAAGGATGGACTTGCTCCAAAACTCGCTGGTCATGTTGTCCAGGCCCAGACTCGTAAAGAAGTCCTCCGCTGCGTGGAACATCTTTTGGGCATCCCATTTCTGTTCCACCATTGTCTTCGAGATATCCAAAGGTTTGTCTTCCATCGTTAGGTGAGGGTATAGTGTGCCCCACTCTTGCGCCCACATGTTCCCTAACAGATGGGCTGGTATCGTGCCATCTTCGGGCAGGCGTCCAGGATAAATCTCCCTGAGCTTCATTCTCACGTAGGCGTGCAGCTTTTTGTACAGCGGAGACAGATCTTCCCACAGTTTGTCGACGATCTCGGTCATGTTTTCCGTCTCGTAGTCACTGAGCCAGGCGCTCTTGATATTGTCGTAACCGTCCAGAGACGCTGCTTCATTGGACAGCTTGATGTAGGGAATGTAGTACTGTTTTATAGCCGGACCAACTGCGTTATGCCATGCCAGCCAAGTTTGCAGTAATTTATCGTAATTGCCAACTTCCTTCATATTCCGGGTGAGATCTGGTTCCAGAGGAAGGTCCTTATCCTTGCCAACGGTCACCTTGGTCGATCCATATATGGCGGCCATCTTCGAACTGAGACTGGTCGCATTCTCAAGCTTGTCGTCGGGAAGAGCTGCCAGGCCAATGGTAGCGACGTGCCTAAATAGTCGTTTGAGCGAATCATTTTTGAAATTGTGCCAGTCGAAACGCTTCGCCGTAATTCCAAACTGCCGCTCCATTTTTGAGACTTCGGTGGAGACCTTATTCGACATGTTTTGGTTGTAATCGGTGATGTTGGAAGCATAGTCCCAACTAGAAGATGAGTCCACGTTGTTAATCGTTGTATATGGGTCATTCAAGCCTTCTATAAAGGCAACGCCCATTGCTTCATCTTTTATCAAGGCTGATACATTCGATAAAGTTGCCAAGTAGGTGTCGAAGTTGTCTGCGGCTGCTGTCGCGTACAGCGCGGTGGCCAGGAGAGCGACGGCCACGAAGCGATCGGCGGCCGACGATCCCGATCGAGCAGCCATGTCGGGCCGTTTCGGTGCGGGTGAAGCTCCGCAGCTGCTCTGGTTTGTTGAGGATGTTGCGCGCGCTCTTCGCTGCTCACCGACGAGAGCGCACTCCG
>rmi3_Contig2
CAAGAGCGCATCTGAGCATGCGCACTGGTATGTTTGCAACCCTCTTTACTAGGCCTAGTGCATTTTAACATGGACCCAGAGGGAAGCCGTGAAAGATCCTGAAACTATTTAATTTAGTGCAAAGTTTATTGATTTAGTGTTGTTGCGAGGTGCCTGCAGTTGGCTACAAGCACATTTAGGATCCATGGACAGTACGTCCATAATTACTCAAGTGAACAGAGAAGAGGAACAACTAACAAATTTTCCTCCTGCTGATCGAGTGCCGCCCTCCAGCAGGAAGCCCCGGCAGCGCGGCTTCCTCAACACGCTGCTCTGCTGCTTCGGGAGCAACAACCAAGGCAACAACCCCGTGATTGCCGAGGAAAATGGCCAGTACTCGCCCAAGCTCCAGGGCAAGTACCTGCTGCCACCCGTGCGGCATCAGGATATTCGCAAGATATGCCTCATCATTGACCTCGATGAGACATTGGTCCATAGCTCATTCAAGCCCATCAACAATGCTGACTTCGTGGTGCCTGTAGAGATAGATGGCACGGTGCACCAGGTGTATGTCTTGAAGAGGCCTTATGTGGACGAGTTCTTGCAGCGAGTTGGCGATGCCTACGAATGTGTCTTATTCACAGCCAGCCTTGCAAAGTATGCTGACCCTGTGGCTGACCTGCTGGACAAGTGGGGTGTCTTCCGGGCACGGTTATTCCGAGAGTCTTGTGTCTTCTACCGAGGAAACTATGTCAAGGACCTTGGTAGGCTGGGCCGGGATCTGCGCAGAGTGGTCATAATAGACAACTCACCGGCCTCGTACATCTTCCATCCTGACAATGCAGTACCTGTCAACTCGTGGTTCGATGACATGTCAGACACGGAGCTGCGGGACCTGATGCCATTGTTCGACGAACTGAGCCGTGTCGAGGACGTGTACACGGTGCTGCGCAACTCCAACAACGCGGCTGGCGGTGGCGGCGGCTCTCCGGCCTTCCCTGCACCACTCCTGATGAACGGCAGCGCGGTGGCTTTGCACAACAGCGGTTCCTAGCATTCCGCACAGTGCGGCTTGTGCAATAGCCCCTTCTCCGCCGGCAGTACAAAAGCGCTTACGGGTCCCGTGCTAGTCTCGCCGGCCTACTTAACGTCGGAGGGGGGCTGCCCCTTGTGCCTTGTCTCTTCCGCTCTGGACGAGAGTTTGTATAATAACGGTGTTCCATAATCTCGCCTGTATCATAGATTAAAGACGACTATTTCAGCCTGCAAAA

Hopelessly stuck R

ADD COMMENTlink modified 13 months ago by Pierre Lindenbaum123k • written 13 months ago by Roelof10
1

try:

   $ sed 's/\(.*[0-9]\+\)/\1\n/g' test.fa > new.fa

or

$ sed 's/\(.*Contig[0-9]\+\)/\1\n/g' test.fa
ADD REPLYlink modified 13 months ago • written 13 months ago by cpad011212k

with bash: Regex works only if there is 6 letter word (contig in this example), preceded by _ (underscore) and followed by a single number (contig number in this example) (probably works with OP data only):

$ grep -Po '\w*\W\w*[0-9]+(?=[ATGCN]*)|(?<=_\w{6}[0-9]).*' test.fa

>rmi3_Contig1
AAAATATGACAGTCTTTATTCCAGTCTATTTTAGCCAATACAGCACACACAGCTCAAGAAGTTCTTAAAATGACAGTCTAGAAATGACTACAAAGCATTTTCTTTTGCTGGAACTTTGTACAAATACAAGTAATTGTCTAAACAACTAGTTAAATATTGGCTCAAGAATCTGCCGTTCAT............................................................GCTGATACATTCGATAAAGTTGCCAAGTAGGTGTCGAAGTTGTCTGCGGCTGCTGTCGCGTACAGCGCGGTGGCCAGGAGAGCGACGGCCACGAAGCGATCGGCGGCCGACGATCCCGATCGAGCAGCCATGTCGGGCCGTTTCGGTGCGGGTGAAGCTCCGCAGCTGCTCTGGTTTGTTGAGGATGTTGCGCGCGCTCTTCGCTGCTCACCGACGAGAGCGCACTCCG
>rmi3_Contig2
CAAGAGCGCATCTGAGCATGCGCACTGGTATGTTTGCAACCCTCTTTACTAGGCCTAGTGCATTTTAACATGGACCCAGAGGGAAGCCGTGAAAGATCCTGAAACTATTTAATTTAGTGCAAAGTTTATTGATTTAGTGTTGTTGCGAGGTGCCTGCAGTTGGCTACAAGCACATTTAGGATCCATGGACAGTACGTCCATAATTACTCAAGTGAACAGAGAAGAGGAACAACTAACAAATTTTCCTCCTGCTGATCGAGTGCCGCCCTCCAGCAGGAAGCCCCGGCAGCGCGGCTTCCTCAACACGCTGCTCTGCTGCTTCGGGAGCAACAACCAAGGCAACAACCCCGTGATTGCCGAGGAAAATGGCCAGTACTCGCCCAAGCTCCAGGGCAAGTACCTGCTGCCACCCGTGCGGCATCAGGATATTCGCAAGATATGCCTCATCATTGACCTCGATGAGACATTGGTCCATAGCTCATTCAAGCCCATCAACAATGCTGACTTCGTGGTGCCTGTAGAGATAGATGGCACGGTGCACCAGGTGTATGTCTTGAAGAGGCCTTATGTGGACGAGTTCTTGCAGCGAGTTGGCGATGCCTACGAATGTGTCTTATTCACAGCCAGCCTTGCAAAGTATGCTGACCCTGTGGCTGACCTGCTGGACAAGTGGGGTGTCTTCCGGGCACGGTTATTCCGAGAGTCTTGTGTCTTCTACCGAGGAAACTATGTCAAGGACCTTGGTAGGCTGGG

ps: Removed part of contig 1 sequence for 5000 character limit.

ADD REPLYlink modified 13 months ago • written 13 months ago by cpad011212k
1
gravatar for Pierre Lindenbaum
13 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:
 sed 's/\([ATGCNatgc]*\)$/ \1/'  input.txt | tr " " "\n"  > out.fa
ADD COMMENTlink written 13 months ago by Pierre Lindenbaum123k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2390 users visited in the last hour