change the header of fasta sequence
1
0
Entering edit mode
3.2 years ago
harry ▴ 30
>exon9_ENST00000462434|exon11_ENST00000462434|exon12_ENST00000462434|exon13_ENST00000462434|exon19_ENST00000462434|exon22_ENST00000462434|exon25_ENST00000462434|
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGAAATCGAAGCAGCTTTATTGCACCATTAAGTACATCACTGCATCAAAGACAGTGCCACAAATGCAAATCCAATCGGAGAAGGTAGCCCTGAGACATGTGGTGGCTGCGAGGGAGAAGGACCCCCAACCCTTGAGGAGCAGCGCTGGAAGAGAATCATTCCTTAATATGGCTCCAATTCCAGAACTGGGCTTTATCATCACAGAAGGAATGGCCTTGGGCTAAGGCTCCAACATAGGTGGAGTCAAGGGCAGTTCCCCATAGGCTGTGGTTCCCCTGCTCCTGTCTCACAGCCTAAGACAGCTTCCAGCAAAAGGCAGTTCATCCCTTTCACCTTCCATCCAACCTAGCCCACCCTTAATAATGCCGGCAGATGAGAAATTCCATTTTAACAGCGCCAAAGTTTCCTCTCTTGGTTCTGCTCAGCACCCATCCCTCACGTCCATGAGTTGTTCAAAGGGTGAACAGCAGTCAGCTCTACCCCAGACCCTGGGCTACAGAGAAATACGGACCTGGAAATACCAAGTCAGAGGCAGGGAAAAGGTAAGGGCAGGCTCATAAACCACAGAAGGGAGAAACAAAAGACCCACATGATGGGTCACAGCAGAGGTAGGCTTAAAAGTAACAATCCTGTTCACCCTCTCAGAAGCCACTTAAATAGAAGATCCCTGGGGGAGAAGATATCCTGCCCCAGGTCCTTACAGAGTGTAGTATTAGGGAGAGTGAAGAACTGATTCTATGCCCTGCCTCCAGGCCTGAGAGTGTCTTGGACAGATCCTAGAAGGCCAGACATAAAGGAGTAAAAAGCAGGCACTCAGCTGGTTTGGAGCCAAGCCTACAGCATCACATACCTGGCAGCAAGGAAAAGAGTCGGGAAAAAGAAACAGAATCTGTTGCAGAAGTCCCCTCTTCTGCAGGGAGGAGTTATGTAACAGCAGAAGTGGCCTCCTAGCAAGAGAGGCTGCCTGGTTTAGACCAGCAGCTTATGAGCGATGATGAGGACAGCCTTCAGGATAGGCATGAAGCTGGACACCTCGCTGAAGCTGCTACAGCCCGCCACCTGGGCATGCACTGCAAGGCCCTGCTCAAAGCTTCCTGCATCCACACATCGGGCAACCTCATGGAGCCCAGCCACGACATGAGGTGAGAG
>exon9_ENST00000462434|exon11_ENST00000462434|exon12_ENST00000462434|exon13_ENST00000462434|exon19_ENST00000462434|exon22_ENST00000462434|exon24_ENST00000462434|
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGGTCCCCTCACAGAGCTTCTCATATAGATACTCCAGACGCTGGGCTGCCTCTTCCAGCTTCCTTTTTGTCTT
>exon9_ENST00000462434|exon11_ENST00000462434|exon12_ENST00000462434|exon13_ENST00000462434|exon19_ENST00000462434|exon22_ENST00000462434|
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAG

As above I have long fasta name file and i want to rename it by just include first and last name like :-

>exon9_ENST00000462434:exon25_ENST00000462434
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGAAATCGAAGCAGCTTTATTGCACCATTAAGTACATCACTGCATCAAAGACAGTGCCACAAATGCAAATCCAATCGGAGAAGGTAGCCCTGAGACATGTGGTGGCTGCGAGGGAGAAGGACCCCCAACCCTTGAGGAGCAGCGCTGGAAGAGAATCATTCCTTAATATGGCTCCAATTCCAGAACTGGGCTTTATCATCACAGAAGGAATGGCCTTGGGCTAAGGCTCCAACATAGGTGGAGTCAAGGGCAGTTCCCCATAGGCTGTGGTTCCCCTGCTCCTGTCTCACAGCCTAAGACAGCTTCCAGCAAAAGGCAGTTCATCCCTTTCACCTTCCATCCAACCTAGCCCACCCTTAATAATGCCGGCAGATGAGAAATTCCATTTTAACAGCGCCAAAGTTTCCTCTCTTGGTTCTGCTCAGCACCCATCCCTCACGTCCATGAGTTGTTCAAAGGGTGAACAGCAGTCAGCTCTACCCCAGACCCTGGGCTACAGAGAAATACGGACCTGGAAATACCAAGTCAGAGGCAGGGAAAAGGTAAGGGCAGGCTCATAAACCACAGAAGGGAGAAACAAAAGACCCACATGATGGGTCACAGCAGAGGTAGGCTTAAAAGTAACAATCCTGTTCACCCTCTCAGAAGCCACTTAAATAGAAGATCCCTGGGGGAGAAGATATCCTGCCCCAGGTCCTTACAGAGTGTAGTATTAGGGAGAGTGAAGAACTGATTCTATGCCCTGCCTCCAGGCCTGAGAGTGTCTTGGACAGATCCTAGAAGGCCAGACATAAAGGAGTAAAAAGCAGGCACTCAGCTGGTTTGGAGCCAAGCCTACAGCATCACATACCTGGCAGCAAGGAAAAGAGTCGGGAAAAAGAAACAGAATCTGTTGCAGAAGTCCCCTCTTCTGCAGGGAGGAGTTATGTAACAGCAGAAGTGGCCTCCTAGCAAGAGAGGCTGCCTGGTTTAGACCAGCAGCTTATGAGCGATGATGAGGACAGCCTTCAGGATAGGCATGAAGCTGGACACCTCGCTGAAGCTGCTACAGCCCGCCACCTGGGCATGCACTGCAAGGCCCTGCTCAAAGCTTCCTGCATCCACACATCGGGCAACCTCATGGAGCCCAGCCACGACATGAGGTGAGAG
>exon9_ENST00000462434:exon24_ENST00000462434
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAGGTCCCCTCACAGAGCTTCTCATATAGATACTCCAGACGCTGGGCTGCCTCTTCCAGCTTCCTTTTTGTCTT
>exon9_ENST00000462434:exon22_ENST00000462434
GCAAATGAAACACCTGTTGGTCTTCTAATCCATTTGGGGGGTTTTTTCAGGGGAGGTATCAGTGGTGCTTGTGCCACTTGCTCTGGCACCTGCAGTGGTGGGAGAGGCTGGCCTTTGCTGAAGGAAGAGGAGATCTGGGGGGAAAAGACACCTGCATCGCCATCCTAAAGTGGCAGTTTAGTCAGGAACTCCACCTACAAACTCCATTTTGGGAGGAATCCTTGAGACACCCAATTTGACCTAGAAAGGTCAGACTCCCATATTCCAGGGGATGGGGAAGTGAGTGGTAGCGAGGGTGGGACTCCCATGCAAGTAGGCTCTTGGAAAGACTACTACATTCAAAGTCTACAATGGAGTGTGGCACAAAATGGATCTATAGAAGAGAGAAAGATAAGAGTCATACTCTTGAAATAACTGTCCCAGCAAAGGGGTCCCACGGTCCCTGAAATACTACAGGGCCCATCCAATAACAAGAGTCAAGGTGAAGGCCTTCTTCACATTGTGGCAGAAACTAACATCCTTTCAGGAAGATGGGCACTAGGGCAAAGGTGCAGCCCTCCCAAACCCCGGGCCCTGGTCTCCCAATCTCCAATATCTCCGCTTCTCAAGCCATATGTCTCTCTCCCACAAACAGAGACAGCCCCTTCCCTCCAGCATTCTCTACCAAGCCCTTCAAACCTTGTCAGCCTGTCTCATATGCTGGACTTCCCAGCTCCTACCCATCACAGAGTACAAACTGATCCAGCCGTTGAAGGAGGCAGCAGAGAACACTGAAGGGTCCCGAGGGCACCACTGCACATCAAAGCACCAGCTGCTCTGTGTTGGTAGCTTATATACCACTGCCTGATGTATAGTCTCATCTCCTTGCACCTGAGCTGTCTCTGGCGGGTTCTTCTGAAGCTCATCTTTACTGTATCCTAAAAGCTTTAGGAATTTCATTCTGGAGTCTTGCTCTAAGGTCACTGGCTGCAGAAGGCCTGTTGTCTGTCACTGTTGAGGTCATTTCCCTTGGGCTGAGGACTCTCACCTAGCCCCACGTCACTCTTCAACCATGTGGCCACTGGTGAGAAGGCTGGGATCCCAATCTGTAAGATGATGTCTCTTTAGAGTGGAGGGTAGCTCCCACAACAATCCGGGGGAAGGGGAAAGGGGGAGACTGTTGGCCCAAGACAGCAGAACCTTGAGCATGAAAAAGCCGATCTCTTAGCTGCTGAACTGGTGGTGCAGGCTGAGTTCTCCTGGAACTCCTGGGGGAGCATGACTCACACTGGAGACAGGGGGCTGTGAGGGAAGAATCCCTTGTAGCTCAGGGGTGAGGCTCATAACTGGAGCAGTAATTGGTGCTGGGGGCATAAATGTCTCTGGCAG

So please can anyone tell me how to do this. Thanks in advance

fasta name • 981 views
ADD COMMENT
0
Entering edit mode

What have you tried?

ADD REPLY
0
Entering edit mode

duplicate of Fasta header trimming

ADD REPLY
0
Entering edit mode

I'd modify the awk solution there and use | as FS, then get [0] and [NF] (if that's a thing). It's not an exact duplicate though, as the "last element" part is dynamic.

ADD REPLY
0
Entering edit mode

If you are confortable with python, you can parse sequences with Biopython and edit sequence names as you wish with string manipulation methods.

ADD REPLY
1
Entering edit mode

This is not a complete answer, and belongs as a comment. "Use tools A and B" are suggestions, not solutions.

ADD REPLY
1
Entering edit mode
3.2 years ago
seidel 11k

You can use awk for this. Begin by setting the Output Field Separator (OFS) to no space, this will affect how the print statement works for your shortened defline. For each line ($0), if it begins with ">" it must be a defline so split it by the pipe char into an array called "a", and save the length of the array (which split returns) into a variable called "n". Then print the new defline consisting of the first and penultimate elements of the split array (also assuming all your deflines end with the pipe char, as in your example). If the line does NOT contain ">" just print it out.

Assuming your fasta records are in foo.fa, you would call it as follows:

awk 'BEGIN{OFS=""} {if($0 ~ /^>/){ n=split($0,a,"|"); print a[1],"|",a[n-1] } else {print $0}}' foo.fa

perlrun would be another easy way to process lines from the command line and get the bits and pieces into an array (if you know perl).

ADD COMMENT

Login before adding your answer.

Traffic: 2174 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6