Question: How can I remove ">" sign from whole genome sequence
0
gravatar for Ahmad_Bio
7 months ago by
Ahmad_Bio0
Ahmad_Bio0 wrote:

I have word file of whole genome sequence around 1709 pages, each gene is separated by ">". I need to blast whole genome sequence against a protein sequence from other organism for homology. Is there anyway to remove this information line ">gm_orf648 67_127_d_D 579383 580123 + 741_nt 246_aa" at once. instead of manually deleting one by one.

sequence • 429 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Ahmad_Bio0
1

ADD REPLYlink written 7 months ago by Pierre Lindenbaum122k
1

Do not do not do not do not do not keep your sequences in Office formats. Ever.

I’m actually amazed it’s even opened that many pages without crashing.

ADD REPLYlink written 7 months ago by jrj.healey13k
1

You don't have to delete these.

ADD REPLYlink written 7 months ago by WouterDeCoster40k

Thank you for meaningful help, I managed to copy it in oligo 7 thereby no need to remove > lines.

ADD REPLYlink written 7 months ago by Ahmad_Bio0
5
gravatar for h.mon
7 months ago by
h.mon27k
Brazil
h.mon27k wrote:

The correct answer to your problem is create a blast database from your file and blast the protein against this database - and blast can correctly parse the lines with >.

Your file is in fasta format (also see Is There A Precise Specification For Fasta Files? ). The line you want to remove is part of the format specification:

The first line in a FASTA file started either with a ">" (greater-than) symbol

A fasta file is just a text file, I guess Word is configured to open text files on your computer - but I doubt it is really a Word document.

Virtually all bioinformatics software can correctly parse fasta format, and there is no need to remove these lines.

For the sake of completeness (even if I shouldn't), here is the answer to your original question:

sed -i.bak '/^>/d' file
ADD COMMENTlink modified 7 months ago • written 7 months ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1438 users visited in the last hour