Question: curating a viridiplantae database
gravatar for Biogeek
4.8 years ago by
Biogeek400 wrote:

Dear all,

I have downloaded the latest Nr.gz file from NCBI and unzipped it. Now I want to only obtain the viridiplantae sequences from this Nr fasta file ONLY. I have tried downloading all the GI numbers for the plant protein sequences and doing a grep as follows.

grep -wFf GIsequences-list NR > viridiplantae.fasta

I however don't get any protein sequences in the output file. Just GI numbers and annotations.

Is there a script which can do better? or a command which I can use to get my so wanted viridplantae Nr database. I am using RAPSEARCH for speed rather than BlastX, so I can't supply the blastx command to search for taxonomic specific annotations.


blast sequences annotation • 1.6k views
ADD COMMENTlink modified 4.8 years ago by untitpoi10 • written 4.8 years ago by Biogeek400
gravatar for untitpoi
4.8 years ago by
untitpoi10 wrote:

Hi, I think using grep -A option could help you. It permits to get not only the line which match your pattern but also a number of line after it. Tough it is not the best solution if your fasta is not monoline which is often the case.

ADD COMMENTlink written 4.8 years ago by untitpoi10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2411 users visited in the last hour