change headers from fasta files
2
1
Entering edit mode
5.7 years ago
erickfqqa ▴ 20

Hello, I'm trying to trim the next fasta headers

  >ID:CHARACTERS | [Genus specie] | strain | gene_name | length | NCBI_ID | other | other | other|
>ID:CHARACTERS | [Genus specie | strain | gene_name | length | NCBI_ID | other | other | other|

And I 'd like to trim them like the following example

>ID:CHARACTERS | [Genus specie] | strain | gene_name |

the ID:CHARATERS, [Genus specie], strain and gene_name are variables accross the headers, but always are separated by "space|space"

sequence • 1.4k views
ADD COMMENT
1
Entering edit mode

Your fasta headers are identical, this will create problems for most downstream tools. You can do what you want with sed:

sed "s/ length | NCBI_ID | other | other | other|//" file.fasta
ADD REPLY
0
Entering edit mode

I'm trying to trim the next fasta headers

And can you elaborate on what you tried and how that didn't deliver the result you had in mind?

ADD REPLY
2
Entering edit mode
5.7 years ago
Joe 21k

This works if you aren't bothered about keeping the last pipe symbol...

cat myseqs.fa | cut -d '|' -f -4
>ID:CHARACTERS | [Genus specie] | strain | gene_name
TCCACGATCGAATAAATGTGCGATTAGCACCTGTAGAACCATACAAGCTAAGCCGCCTCACGGGCATGTTAACGAGATTA
>ID:CHARACTERS | [Genus specie | strain | gene_name
TATTAATTTTAGTAAAACATAGCGTCGGAGACGGGCCGACACCAACGATCCGTTACCCCACATGCACCGGAGTAAGCGAC

I'll let you exert some effort to keep the last delimiter if you want it ...

ADD COMMENT
0
Entering edit mode

I'll add that you really should use the search function as this type of question is Biostar's 'public enemy number one'.

You're lucky it's easy and I'm sat in front of a terminal though ;)

ADD REPLY
1
Entering edit mode
5.7 years ago
$ cat test.fa 
>ID:CHARACTERS | [Genus specie | strain | gene_name | length | NCBI_ID | other | other | other|
atgc

$ sed '/^>/ s/|\s\w\+\s*//3g' test.fa 
>ID:CHARACTERS | [Genus specie | strain | gene_name |
atgc
ADD COMMENT

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6