curating a viridiplantae database
Entering edit mode
5.1 years ago
Biogeek ▴ 400

Dear all,

I have downloaded the latest Nr.gz file from NCBI and unzipped it. Now I want to only obtain the viridiplantae sequences from this Nr fasta file ONLY. I have tried downloading all the GI numbers for the plant protein sequences and doing a grep as follows.

grep -wFf GIsequences-list NR > viridiplantae.fasta

I however don't get any protein sequences in the output file. Just GI numbers and annotations.

Is there a script which can do better? or a command which I can use to get my so wanted viridplantae Nr database. I am using RAPSEARCH for speed rather than BlastX, so I can't supply the blastx command to search for taxonomic specific annotations.


blast sequences annotation • 1.6k views
Entering edit mode
5.1 years ago
untitpoi ▴ 10

Hi, I think using grep -A option could help you. It permits to get not only the line which match your pattern but also a number of line after it. Tough it is not the best solution if your fasta is not monoline which is often the case.


Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6