Question: How can I remove ">" sign from whole genome sequence
0
gravatar for Ahmad_Bio
4 weeks ago by
Ahmad_Bio0
Ahmad_Bio0 wrote:

I have word file of whole genome sequence around 1709 pages, each gene is separated by ">". I need to blast whole genome sequence against a protein sequence from other organism for homology. Is there anyway to remove this information line ">gm_orf648 67_127_d_D 579383 580123 + 741_nt 246_aa" at once. instead of manually deleting one by one.

sequence • 353 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Ahmad_Bio0
1

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum117k
1

Do not do not do not do not do not keep your sequences in Office formats. Ever.

I’m actually amazed it’s even opened that many pages without crashing.

ADD REPLYlink written 4 weeks ago by jrj.healey11k
1

You don't have to delete these.

ADD REPLYlink written 4 weeks ago by WouterDeCoster37k

Thank you for meaningful help, I managed to copy it in oligo 7 thereby no need to remove > lines.

ADD REPLYlink written 4 weeks ago by Ahmad_Bio0
5
gravatar for h.mon
4 weeks ago by
h.mon23k
Brazil
h.mon23k wrote:

The correct answer to your problem is create a blast database from your file and blast the protein against this database - and blast can correctly parse the lines with >.

Your file is in fasta format (also see Is There A Precise Specification For Fasta Files? ). The line you want to remove is part of the format specification:

The first line in a FASTA file started either with a ">" (greater-than) symbol

A fasta file is just a text file, I guess Word is configured to open text files on your computer - but I doubt it is really a Word document.

Virtually all bioinformatics software can correctly parse fasta format, and there is no need to remove these lines.

For the sake of completeness (even if I shouldn't), here is the answer to your original question:

sed -i.bak '/^>/d' file
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by h.mon23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 737 users visited in the last hour