Question: NCBI web Blastn csv headers
0
gravatar for bioinfoSeeker
16 months ago by
United Kingdom
bioinfoSeeker20 wrote:

Hi, I am looking for information about the blastn report headers. I can only find the info about the csv report headers from ncbi blast+ run from command line. I cannot find this information for the webserver blastn csv report. The link from the ncbi webserver blast page called "Blast report description" doesn't seem to be working. [https://docs.google.com/viewer?url=ftp%3A%2F%2Fftp.ncbi.nlm.nih.gov%2Fpub%2Ffactsheets%2FHowTo_NewBLAST.pdf]

The csv report from blast+ looks different to that from webserver blastn. The blast+ command line produces with outfmt 6 param, 21 columns. The webserver download has 14 columns. These 14 columns don't match with that from the blast+, and the only header information I have is that of blast+. I seek this information, because I'm currently writing a parser that will extract specific columns from the webserver output that contains over 12000 rows, and summarize it. I am hesitating to just guess what the headers might be, as I am still new to parsing blast content.

I googled, and I couldn't find any information about the headers for webserver blast report. I wonder if someone could help me with the headers or point me in the right direction. Many thanks.

blastn blast ncbi blasn • 1.7k views
ADD COMMENTlink modified 16 months ago by genomax62k • written 16 months ago by bioinfoSeeker20
0
gravatar for genomax
16 months ago by
genomax62k
United States
genomax62k wrote:

Those columns should be the same as the ones in HitTable(Text) format. The headers for those are below.

# Fields: query id, subject ids, query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
ADD COMMENTlink modified 16 months ago • written 16 months ago by genomax62k

Thanks for your reply. I do already have the header names for the Blast+ report. Its the webserver report headers that I can't find. I had initially thought they matched with each other, but when I look at the content of the reports, they look different. See the sample rows below from two different reports (one from webserver blastn and the other from blast+ command line): I am struggling to match the columns, mainly those with just numbers, like for like.

Example row from webserver CSV:

000001452|size:6251 gi|765560879|gb|KP101908.1| 000001452|size:6251 KP101908.1 100 253 0 0 1 253 153 405 1E-27 468

Example row from commandline Blast+csv

NODE_1 9774 gi|507382352|ref|NC_021277.1| NC_021277 14194 547 441 80.622 8 10 96 2346 2887 13168 13709 ######## 414 CACTACTGTTATTTATTAGA…. TTCAGT 7 6

ADD REPLYlink modified 16 months ago • written 16 months ago by bioinfoSeeker20

The example above is from NCBI webserver blast+ report. When you look under "Download" option you can find this option before HitTable(csv) option (among many others) which is what you are referring to.

Here is an example fresh from NCBI Webserver. Use the RID below to pull up this search or do one yourself.

HitTable(Text)

# blastn
# Iteration: 0
# Query: gi|9626372|ref|NC_001422.1| Coliphage phiX174, complete genome
# RID: WUA086KG014
# Database: nr    
# Fields: query id, subject ids, query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
 # 100 hits found
gi|9626372|ref|NC_001422.1| gi|1227886308|gb|MF169985.1|    gi|9626372|ref|NC_001422.1| MF169985.1  100.000 2660    0   0   1   2660    2968    309 0.0 4913
gi|9626372|ref|NC_001422.1| gi|1124775477|emb|LK995457.1|   gi|9626372|ref|NC_001422.1| LK995457.1  100.000 2660    0   0   1   2660    3872    1213    0.0 4913

HitTable(csv)

gi|9626372|ref|NC_001422.1| gi|1227886308|gb|MF169985.1|    gi|9626372|ref|NC_001422.1| MF169985.1  100 2660    0   0   1   2660    2968    309 0   4913
gi|9626372|ref|NC_001422.1| gi|1124775477|emb|LK995457.1|   gi|9626372|ref|NC_001422.1| LK995457.1  100 2660    0   0   1   2660    3872    1213    0   4913
ADD REPLYlink modified 16 months ago • written 16 months ago by genomax62k

Thank you. When I download the hittable(CSV) or hittable(txt), it doesn't have a header. Not sure if I am missing some option that I have to click. Anyway, i see in your example 14 column headers, a subset of the 21 headers from blast+ command line report. I shall go ahead and use that to match my ncbi webreport, in my parsing script. Thank you so much for your help.

ADD REPLYlink written 16 months ago by bioinfoSeeker20

The csv format file lacks the header but the text has it. So I asked you to look at both to confirm that you can use the ones from the text file for the csv. NCBI does many things on the web interface that are not exactly reproducible (not the results but formatting) on command line.

ADD REPLYlink written 16 months ago by genomax62k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 620 users visited in the last hour