need to remove some part of header fasta file
3
0
Entering edit mode
2.2 years ago
Princy ▴ 60

Hello everyone, I need to remove some parts of the header of fasta file. How can I do it pls let me know.

>QBIY01000001.1 L r b J isolate D scaffold_1, whole genome shotgun sequence
TCGCTTCCAGTTCCGGGTCTCTCTGTTCACTTCCCccttggcggccatttcagcgtgcctcgccggctcgctcgtcgcgaagttttgtcggctatgtccccaactctgagcgttttccTATCGGACTGCTttactgttgccaaccggactgtcttTATCG

enter code here

I need my header like this

>QBIY01000001.1 
TCGCTTCCAGTTCCGGGTCTCTCTGTTCACTTCCCccttggcggccatttcagcgtgcctcgccggctcgctcgtcgcgaagttttgtcggctatgtccccaactctgagcgttttccTATCGGACTGCTttactgttgccaaccggactgtcttTATCG
Fastq Linux Ngs • 1.0k views
ADD COMMENT
2
Entering edit mode
2.2 years ago
$ awk '{print $1}' input.fa
$ sed -r '/^>/ s/\s.*//' test.fa
ADD COMMENT
1
Entering edit mode
2.2 years ago
seidel 11k

Since you want just the first field of every line, you could use cut:

cut -f1 input.fasta > output.fasta
ADD COMMENT
0
Entering edit mode

Hello thanks for reply, I have edited my question. its not from every line, I only want to change the header of fasta file. pl let me know how can i do it.

ADD REPLY
1
Entering edit mode

You can use the solution exactly the way I suggested it. The header has multiple fields and you want just the first one. All the other lines are themselves single fields, so they will be unaffected (that is, they will be included in the output). This solution will only fail if you have sequence lines separated by white space, which would be very unusual. It's basically the same solution @cpad0112 went on to suggest (using awk instead).

ADD REPLY
1
Entering edit mode
2.2 years ago
lethalfang ▴ 140

Try this cat input.fasta | awk '{if ($0 ~ /^>/) $0=$1}'1 > output.fasta

If the line starts with >, take the 1st item. Print every line.

ADD COMMENT

Login before adding your answer.

Traffic: 2096 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6