Question: change headers from fasta files
1
gravatar for erickfqqa
21 months ago by
erickfqqa20
erickfqqa20 wrote:

Hello, I'm trying to trim the next fasta headers

  >ID:CHARACTERS | [Genus specie] | strain | gene_name | length | NCBI_ID | other | other | other|
>ID:CHARACTERS | [Genus specie | strain | gene_name | length | NCBI_ID | other | other | other|

And I 'd like to trim them like the following example

>ID:CHARACTERS | [Genus specie] | strain | gene_name |

the ID:CHARATERS, [Genus specie], strain and gene_name are variables accross the headers, but always are separated by "space|space"

sequence • 594 views
ADD COMMENTlink modified 21 months ago by Joe16k • written 21 months ago by erickfqqa20
1

Your fasta headers are identical, this will create problems for most downstream tools. You can do what you want with sed:

sed "s/ length | NCBI_ID | other | other | other|//" file.fasta
ADD REPLYlink modified 21 months ago • written 21 months ago by h.mon29k

I'm trying to trim the next fasta headers

And can you elaborate on what you tried and how that didn't deliver the result you had in mind?

ADD REPLYlink written 21 months ago by WouterDeCoster43k
2
gravatar for Joe
21 months ago by
Joe16k
United Kingdom
Joe16k wrote:

This works if you aren't bothered about keeping the last pipe symbol...

cat myseqs.fa | cut -d '|' -f -4
>ID:CHARACTERS | [Genus specie] | strain | gene_name
TCCACGATCGAATAAATGTGCGATTAGCACCTGTAGAACCATACAAGCTAAGCCGCCTCACGGGCATGTTAACGAGATTA
>ID:CHARACTERS | [Genus specie | strain | gene_name
TATTAATTTTAGTAAAACATAGCGTCGGAGACGGGCCGACACCAACGATCCGTTACCCCACATGCACCGGAGTAAGCGAC

I'll let you exert some effort to keep the last delimiter if you want it ...

ADD COMMENTlink modified 21 months ago • written 21 months ago by Joe16k

I'll add that you really should use the search function as this type of question is Biostar's 'public enemy number one'.

You're lucky it's easy and I'm sat in front of a terminal though ;)

ADD REPLYlink written 21 months ago by Joe16k
1
gravatar for cpad0112
21 months ago by
cpad011213k
India
cpad011213k wrote:
$ cat test.fa 
>ID:CHARACTERS | [Genus specie | strain | gene_name | length | NCBI_ID | other | other | other|
atgc

$ sed '/^>/ s/|\s\w\+\s*//3g' test.fa 
>ID:CHARACTERS | [Genus specie | strain | gene_name |
atgc
ADD COMMENTlink modified 21 months ago • written 21 months ago by cpad011213k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1724 users visited in the last hour