Question: change the header of fasta sequence
0
gravatar for harry
13 days ago by
harry10
harry10 wrote:
>exon9_ENST00000462434|exon11_ENST00000462434|exon12_ENST00000462434|exon13_ENST00000462434|exon19_ENST00000462434|exon22_ENST00000462434|exon25_ENST00000462434|
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGAAATCGAAGCAGCTTTATTGCACCATTAAGTACATCACTGCATCAAAGACAGTGCCACAAATGCAAATCCAATCGGAGAAGGTAGCCCTGAGACATGTGGTGGCTGCGAGGGAGAAGGACCCCCAACCCTTGAGGAGCAGCGCTGGAAGAGAATCATTCCTTAATATGGCTCCAATTCCAGAACTGGGCTTTATCATCACAGAAGGAATGGCCTTGGGCTAAGGCTCCAACATAGGTGGAGTCAAGGGCAGTTCCCCATAGGCTGTGGTTCCCCTGCTCCTGTCTCACAGCCTAAGACAGCTTCCAGCAAAAGGCAGTTCATCCCTTTCACCTTCCATCCAACCTAGCCCACCCTTAATAATGCCGGCAGATGAGAAATTCCATTTTAACAGCGCCAAAGTTTCCTCTCTTGGTTCTGCTCAGCACCCATCCCTCACGTCCATGAGTTGTTCAAAGGGTGAACAGCAGTCAGCTCTACCCCAGACCCTGGGCTACAGAGAAATACGGACCTGGAAATACCAAGTCAGAGGCAGGGAAAAGGTAAGGGCAGGCTCATAAACCACAGAAGGGAGAAACAAAAGACCCACATGATGGGTCACAGCAGAGGTAGGCTTAAAAGTAACAATCCTGTTCACCCTCTCAGAAGCCACTTAAATAGAAGATCCCTGGGGGAGAAGATATCCTGCCCCAGGTCCTTACAGAGTGTAGTATTAGGGAGAGTGAAGAACTGATTCTATGCCCTGCCTCCAGGCCTGAGAGTGTCTTGGACAGATCCTAGAAGGCCAGACATAAAGGAGTAAAAAGCAGGCACTCAGCTGGTTTGGAGCCAAGCCTACAGCATCACATACCTGGCAGCAAGGAAAAGAGTCGGGAAAAAGAAACAGAATCTGTTGCAGAAGTCCCCTCTTCTGCAGGGAGGAGTTATGTAACAGCAGAAGTGGCCTCCTAGCAAGAGAGGCTGCCTGGTTTAGACCAGCAGCTTATGAGCGATGATGAGGACAGCCTTCAGGATAGGCATGAAGCTGGACACCTCGCTGAAGCTGCTACAGCCCGCCACCTGGGCATGCACTGCAAGGCCCTGCTCAAAGCTTCCTGCATCCACACATCGGGCAACCTCATGGAGCCCAGCCACGACATGAGGTGAGAG
>exon9_ENST00000462434|exon11_ENST00000462434|exon12_ENST00000462434|exon13_ENST00000462434|exon19_ENST00000462434|exon22_ENST00000462434|exon24_ENST00000462434|
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGGTCCCCTCACAGAGCTTCTCATATAGATACTCCAGACGCTGGGCTGCCTCTTCCAGCTTCCTTTTTGTCTT
>exon9_ENST00000462434|exon11_ENST00000462434|exon12_ENST00000462434|exon13_ENST00000462434|exon19_ENST00000462434|exon22_ENST00000462434|
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAG

As above I have long fasta name file and i want to rename it by just include first and last name like :-

>exon9_ENST00000462434:exon25_ENST00000462434
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGAAATCGAAGCAGCTTTATTGCACCATTAAGTACATCACTGCATCAAAGACAGTGCCACAAATGCAAATCCAATCGGAGAAGGTAGCCCTGAGACATGTGGTGGCTGCGAGGGAGAAGGACCCCCAACCCTTGAGGAGCAGCGCTGGAAGAGAATCATTCCTTAATATGGCTCCAATTCCAGAACTGGGCTTTATCATCACAGAAGGAATGGCCTTGGGCTAAGGCTCCAACATAGGTGGAGTCAAGGGCAGTTCCCCATAGGCTGTGGTTCCCCTGCTCCTGTCTCACAGCCTAAGACAGCTTCCAGCAAAAGGCAGTTCATCCCTTTCACCTTCCATCCAACCTAGCCCACCCTTAATAATGCCGGCAGATGAGAAATTCCATTTTAACAGCGCCAAAGTTTCCTCTCTTGGTTCTGCTCAGCACCCATCCCTCACGTCCATGAGTTGTTCAAAGGGTGAACAGCAGTCAGCTCTACCCCAGACCCTGGGCTACAGAGAAATACGGACCTGGAAATACCAAGTCAGAGGCAGGGAAAAGGTAAGGGCAGGCTCATAAACCACAGAAGGGAGAAACAAAAGACCCACATGATGGGTCACAGCAGAGGTAGGCTTAAAAGTAACAATCCTGTTCACCCTCTCAGAAGCCACTTAAATAGAAGATCCCTGGGGGAGAAGATATCCTGCCCCAGGTCCTTACAGAGTGTAGTATTAGGGAGAGTGAAGAACTGATTCTATGCCCTGCCTCCAGGCCTGAGAGTGTCTTGGACAGATCCTAGAAGGCCAGACATAAAGGAGTAAAAAGCAGGCACTCAGCTGGTTTGGAGCCAAGCCTACAGCATCACATACCTGGCAGCAAGGAAAAGAGTCGGGAAAAAGAAACAGAATCTGTTGCAGAAGTCCCCTCTTCTGCAGGGAGGAGTTATGTAACAGCAGAAGTGGCCTCCTAGCAAGAGAGGCTGCCTGGTTTAGACCAGCAGCTTATGAGCGATGATGAGGACAGCCTTCAGGATAGGCATGAAGCTGGACACCTCGCTGAAGCTGCTACAGCCCGCCACCTGGGCATGCACTGCAAGGCCCTGCTCAAAGCTTCCTGCATCCACACATCGGGCAACCTCATGGAGCCCAGCCACGACATGAGGTGAGAG
>exon9_ENST00000462434:exon24_ENST00000462434
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGGTCCCCTCACAGAGCTTCTCATATAGATACTCCAGACGCTGGGCTGCCTCTTCCAGCTTCCTTTTTGTCTT
>exon9_ENST00000462434:exon22_ENST00000462434
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAG

So please can anyone tell me how to do this. Thanks in advance

name fasta • 120 views
ADD COMMENTlink modified 12 days ago by manvikakkar5650 • written 13 days ago by harry10

What have you tried?

ADD REPLYlink written 13 days ago by Ram32k

duplicate of Fasta header trimming

ADD REPLYlink written 12 days ago by Pierre Lindenbaum134k

I'd modify the awk solution there and use | as FS, then get [0] and [NF] (if that's a thing). It's not an exact duplicate though, as the "last element" part is dynamic.

ADD REPLYlink written 12 days ago by Ram32k

If you are confortable with python, you can parse sequences with Biopython and edit sequence names as you wish with string manipulation methods.

ADD REPLYlink written 12 days ago by the_dummy30
1

This is not a complete answer, and belongs as a comment. "Use tools A and B" are suggestions, not solutions.

ADD REPLYlink written 12 days ago by Ram32k
0
gravatar for seidel
12 days ago by
seidel7.4k
United States
seidel7.4k wrote:

You can use awk for this. Begin by setting the Output Field Separator (OFS) to no space, this will affect how the print statement works for your shortened defline. For each line ($0), if it begins with ">" it must be a defline so split it by the pipe char into an array called "a", and save the length of the array (which split returns) into a variable called "n". Then print the new defline consisting of the first and penultimate elements of the split array (also assuming all your deflines end with the pipe char, as in your example). If the line does NOT contain ">" just print it out.

Assuming your fasta records are in foo.fa, you would call it as follows:

awk 'BEGIN{OFS=""} {if($0 ~ /^>/){ n=split($0,a,"|"); print a[1],"|",a[n-1] } else {print $0}}' foo.fa

perlrun would be another easy way to process lines from the command line and get the bits and pieces into an array (if you know perl).

ADD COMMENTlink modified 12 days ago • written 12 days ago by seidel7.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1276 users visited in the last hour
_