Question: curating a viridiplantae database
0
gravatar for Biogeek
4.2 years ago by
Biogeek380
Biogeek380 wrote:

Dear all,

I have downloaded the latest Nr.gz file from NCBI and unzipped it. Now I want to only obtain the viridiplantae sequences from this Nr fasta file ONLY. I have tried downloading all the GI numbers for the plant protein sequences and doing a grep as follows.

grep -wFf GIsequences-list NR > viridiplantae.fasta

I however don't get any protein sequences in the output file. Just GI numbers and annotations.

Is there a script which can do better? or a command which I can use to get my so wanted viridplantae Nr database. I am using RAPSEARCH for speed rather than BlastX, so I can't supply the blastx command to search for taxonomic specific annotations.

Thanks.

blast sequences annotation • 1.4k views
ADD COMMENTlink modified 4.2 years ago by untitpoi10 • written 4.2 years ago by Biogeek380
0
gravatar for untitpoi
4.2 years ago by
untitpoi10
France/Montpellier
untitpoi10 wrote:

Hi, I think using grep -A option could help you. It permits to get not only the line which match your pattern but also a number of line after it. Tough it is not the best solution if your fasta is not monoline which is often the case.

ADD COMMENTlink written 4.2 years ago by untitpoi10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1227 users visited in the last hour