Question: Help required to Rectify/reformat the fasta header of nr database fasta sequences
0
gravatar for bilal.sarwar
2.1 years ago by
Pakistan/Lahore/CEMB-PU
bilal.sarwar0 wrote:

Hello all

I am a beginner in bioinformatics. I have downloaded the complete nr database from the NCBI. it contains106785170 nr protein sequences altogether. the fasta header of every sequence start with fatsta symbol > followed by the accession number and other information. here is the examples

>WP_003131952.1 30S ribosomal protein S18 [Lactococcus lactis]NP_26834..........................

>XP_642131.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideu.............................

>XP_642837.1 hypothetical protein DDB_G0276911 [Dictyostelium d......................................

i want to add the accession number between pipe character "|" of every sequence in the header.

>|WP_003131952.1| 30S ribosomal protein S18 [Lactococcus lactis]NP_26834..........................

>|XP_642131.1| hypothetical protein DDB_G0277827 [Dictyostelium discoideu.............................

>|XP_642837.1|hypothetical protein DDB_G0276911 [Dictyostelium d......................................

kindly help me to solve this issue.

regards bilal

blast rna-seq header ncbi fasta • 775 views
ADD COMMENTlink modified 2.1 years ago by Pierre Lindenbaum119k • written 2.1 years ago by bilal.sarwar0
1
gravatar for Pierre Lindenbaum
2.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:
 sed '/^>/s/>\([^ ]*\)/>|\1|/' input.fa > out.fa
ADD COMMENTlink written 2.1 years ago by Pierre Lindenbaum119k

thanks for help ..... :)

ADD REPLYlink written 2.1 years ago by bilal.sarwar0
1

Please check the green mark on the left to flag this question as answered.

ADD REPLYlink written 2.1 years ago by Pierre Lindenbaum119k

bro, I got an error while preparing the blastable database after Rectify the fasta headers with -parse_seqids tag. without -parse_seqids all work well. actually, i am using Blast2Go software for mapping and annotation. here in this page How to create a Fasta file database for local Blast and to import XML results successfully into Blast2GO, they give the instruction about the header format.

here is the error volume: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi

file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.pin file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.phr file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.psq file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.psi file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.psd file: /data/storage_green/mbil/compressed_file/nr_database/nrdb2016/nr_modi.pog

BLAST Database creation error: Defline lacks a proper ID around line 380

this is the 380 line

>|CAD71090.1| conserved hypothetical protein [Neurospora crassa]

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by bilal.sarwar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 752 users visited in the last hour