Question: change headers from fasta files
1
gravatar for erickfqqa
10 months ago by
erickfqqa20
erickfqqa20 wrote:

Hello, I'm trying to trim the next fasta headers

  >ID:CHARACTERS | [Genus specie] | strain | gene_name | length | NCBI_ID | other | other | other|
>ID:CHARACTERS | [Genus specie | strain | gene_name | length | NCBI_ID | other | other | other|

And I 'd like to trim them like the following example

>ID:CHARACTERS | [Genus specie] | strain | gene_name |

the ID:CHARATERS, [Genus specie], strain and gene_name are variables accross the headers, but always are separated by "space|space"

sequence • 367 views
ADD COMMENTlink modified 10 months ago by jrj.healey12k • written 10 months ago by erickfqqa20
1

Your fasta headers are identical, this will create problems for most downstream tools. You can do what you want with sed:

sed "s/ length | NCBI_ID | other | other | other|//" file.fasta
ADD REPLYlink modified 10 months ago • written 10 months ago by h.mon26k

I'm trying to trim the next fasta headers

And can you elaborate on what you tried and how that didn't deliver the result you had in mind?

ADD REPLYlink written 10 months ago by WouterDeCoster39k
2
gravatar for jrj.healey
10 months ago by
jrj.healey12k
United Kingdom
jrj.healey12k wrote:

This works if you aren't bothered about keeping the last pipe symbol...

cat myseqs.fa | cut -d '|' -f -4
>ID:CHARACTERS | [Genus specie] | strain | gene_name
TCCACGATCGAATAAATGTGCGATTAGCACCTGTAGAACCATACAAGCTAAGCCGCCTCACGGGCATGTTAACGAGATTA
>ID:CHARACTERS | [Genus specie | strain | gene_name
TATTAATTTTAGTAAAACATAGCGTCGGAGACGGGCCGACACCAACGATCCGTTACCCCACATGCACCGGAGTAAGCGAC

I'll let you exert some effort to keep the last delimiter if you want it ...

ADD COMMENTlink modified 10 months ago • written 10 months ago by jrj.healey12k

I'll add that you really should use the search function as this type of question is Biostar's 'public enemy number one'.

You're lucky it's easy and I'm sat in front of a terminal though ;)

ADD REPLYlink written 10 months ago by jrj.healey12k
1
gravatar for cpad0112
10 months ago by
cpad011211k
India
cpad011211k wrote:
$ cat test.fa 
>ID:CHARACTERS | [Genus specie | strain | gene_name | length | NCBI_ID | other | other | other|
atgc

$ sed '/^>/ s/|\s\w\+\s*//3g' test.fa 
>ID:CHARACTERS | [Genus specie | strain | gene_name |
atgc
ADD COMMENTlink modified 10 months ago • written 10 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1397 users visited in the last hour