How to remove the header in fasta file and keep only the desirable part on ubuntu?
3
0
Entering edit mode
11 months ago
Jelo • 0

Hi all,

I have a fasta file with this header

>10005_M12.fastq    Otu0001|242290|M1.fastq-M12.fastq-M5.fastq-URTM6.fastq-M7.fastq-M9.fastq

I want to remove all the header parts except the OTU (with its number), I used the this command "sed 's/>M.Otu/>Otu/g' rep.fasta |sed -e 's/|.//g'> rep.otu.fasta" but the command removed only the part after OTU as following;

>10005_M12.fastq    Otu0001

I want the header looks like (>Otu0001)

any advices will be appreciated Thank you

microbiome bioinformatics fasta NGS • 1.1k views
ADD COMMENT
0
Entering edit mode

Thank you all for help

ADD REPLY
1
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
upvote_bookmark_accept

ADD REPLY
3
Entering edit mode
11 months ago
 sed '/^>/s/.*[ \t]*\(Otu[0-9]*\).*/>\1/' in.fa
ADD COMMENT
2
Entering edit mode
11 months ago

seqkit answer also for posterity

seqkit replace -p "\|.*" in.fa
ADD COMMENT
1
Entering edit mode
seqkit replace -p "^.+\s|\|.*" foo.fasta

or

seqkit replace -p ".+\s(\w+)\|.+" -r "\$1" foo.fasta

or just

seqkit seq -i --id-regexp "\s(\w+)\|" foo.fasta
ADD REPLY
1
Entering edit mode
11 months ago

if sequences have no |, try this:

$ awk -F "|" '{print $1}' test.fa 

if you are not sure, you can use this:

$ awk -F "|" '/^>/ {print $1}; !/^>/' test.fa

or this:

$ awk -F "|" '{print ($0 ~ /^>/)?$1:$0}' test.fa
ADD COMMENT

Login before adding your answer.

Traffic: 715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6