Remove part of headers in FASTA file
1
0
Entering edit mode
14 months ago
fatemeh • 0

I want to delete the part starting after "CDS" until "p". And the characters between CDS and p vary.
I have a header like this:

>Eindica.01T000001.1 CDS=1-201_p
ATGACTAGGGAACGTGGACGACCAGCAAGGGCTTGGAAGCCAAGGCAGGGATTAG
>Eindica.01T000002.1 CDS=1-3027_p
ATGACCTTCTATGGATACACACAACACTCCTTAA
>Eindica.01T000005.1 CDS=577-4218_p
CCAAGGCCAAAAAATACATTTCTAAGGCCAAAGCTTGGCTTGAATGAATCTTGA

and I want to keep the header like this:

>Eindica.01T000001.1 

ATGACTAGGGAACGTGGACGACCAGCAAGGGCTTGGAAGCCAAGGCAGGGATTAG

>Eindica.01T000002.1 

ATGACCTTCTATGGATACACACAACACTCCTTAA

>Eindica.01T000005.1 

CCAAGGCCAAAAAATACATTTCTAAGGCCAAAGCTTGGCTTGAATGAATCTTGA

Can someone help me with a solution? Would be great. Thank you.

Fasta • 678 views
ADD COMMENT
0
Entering edit mode

Use cut -d ' '

ADD REPLY
1
Entering edit mode
14 months ago
Mark ★ 1.5k

Use seqkit

seqkit replace -p " .+$" -r "" test.fasta

Where -p is the pattern (regex) and -r is the replacement pattern (in this case it's empty string)

https://bioinf.shenwei.me/seqkit/

ADD COMMENT

Login before adding your answer.

Traffic: 1893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6