How to remove the header line with ">" in a fasta file
3
0
Entering edit mode
7.9 years ago
bright602 ▴ 50

Hi,

I am a beginner in bioinformatics. I have a fasta file like below (plz ignore the "|")

> Chr3:183153228-183153246
TGGAAAGGACGAAACACCGCG
> Chr3:183286843-183286861
CTAGAAATAGCAAGTTAAA

How do I remove the header so that I can extract the sequence

TGGAAAGGACGAAACACCGCG
CTAGAAATAGCAAGTTAAA

Thank you for your help.

sequencing genome • 13k views
ADD COMMENT
4
Entering edit mode
7.9 years ago
Asaf 10k

On Linux grep -v ">" file name

ADD COMMENT
0
Entering edit mode

Just as an alternative

sed '/^>/d' foo.fa > out.fa
ADD REPLY
3
Entering edit mode
7.9 years ago
ablanchetcohen ★ 1.2k

If you just want to remove all lines starting with ">", you could just use grep, among other options.

grep -v ">" file.fasta > file_without_header.txt
ADD COMMENT
3
Entering edit mode
7.9 years ago

I am wondering why do you want to do that. If you erase that line, all of your sequences will be mixed up without any distinction one to the other unless all of them use a single line. In addition, most programs can handle that line by maintaining the identity of each sequence

ADD COMMENT
1
Entering edit mode

This was recently discussed in a similar thread on Biostars and I had posted the reason below, which I will reproduce here.

Reason I do this sometimes is to cluster (sort|uniq) and/or count number of unique sequences.

ADD REPLY
0
Entering edit mode

has sense this way..

ADD REPLY
0
Entering edit mode

If you want to do that, you can pipe everything through: grep -v '>' file.fasta | sort | uniq | wc -l to get the number of unique sequences, or grep -v '>' file.fasta | sort | uniq -c to get the number of times each sequence appears. However, clustering would be different than counting. The examples here gets you counts, if you want to cluster (you'll want to change the headers), than you would need some script to do so. I recommend biopython to parse your sequences for ease of use. Again, all assuming the op is on linux, and no coding experience. Alternatively, you can use collapser from fastx_toolkit

ADD REPLY
0
Entering edit mode

Can you clarify what is the meaning of clustering in this thread?

ADD REPLY
0
Entering edit mode

I agree, what's your reasoning for doing this?

ADD REPLY
0
Entering edit mode

In fact my point of why would one want to do that, what is the motivation behind it.

ADD REPLY

Login before adding your answer.

Traffic: 3953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6