Extract header from fasta file
2
0
Entering edit mode
7 weeks ago
Princy ▴ 40

Hello, How can I extract the id from the Orffinder fasta file result?

>lcl|ORF2_TRINITY_DN74698_c0_g1_i1:302:0 unnamed protein product, partial
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVRKKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAPRIS

I need to extract the id like this, Kindly let me know.

>TRINITY_DN74698_c0_g1_i1
header fasta • 248 views
ADD COMMENT
3
Entering edit mode
7 weeks ago

A seqkit answer. You may need to tweak the regex depending on the variability in the naming scheme.

seqkit replace -p ".*ORF\d+_([^:]+).*" -r "\$1" test.fasta

>TRINITY_DN74698_c0_g1_i1
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVR
KKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAPRIS
ADD COMMENT
2
Entering edit mode
7 weeks ago
$ awk -F "_|:" -v OFS="_" '/^>/{print ">"$2,$3,$4,$5,$6};!/>/' test.fa

>TRINITY_DN74698_c0_g1_i1
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVRKKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAP


$ sed -r '/^>/ s/.*ORF2_/>/;s/:.*//' test.fa

>TRINITY_DN74698_c0_g1_i1
MRIRSVVFTLRPRAKWMAPSSGMRLLLMFRSSSVLLCLSASSRATAPSLPRPLYDKSRVRKKIFPSRPLAAMAPFPRMQFQARLSDFIPAFSAIAAP
ADD COMMENT

Login before adding your answer.

Traffic: 1863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6