Question: Blastx With Xml Format
0
gravatar for jinzhangut2010
7.0 years ago by
jinzhangut20100 wrote:

Hi all, I'm currently using standard alone blast, the version is ncbi-blast-2.2.25+. I tried to made xml format blast result for my downstream analysis, but I had some problems in generating it.

The command I use is:

"blastx -query shorttest -db ~/blastdb/nr -evalue 1e-3 -max_target_seqs 20 -num_threads 24 -outfmt 5 > t.xml"

The command keeps running but nothing has been written to t.xml, and if I use other output format like default, or -outfmt 6, it will print out the correct result.

Thanks very much!

EDIT:

Yes it's transcriptome data, and I was trying to annotate the whole transcriptome through blast2go, and the first step I need to do is doing blastx against nr database.

I understand there might be stream for blastx, but I can wait for 24 hours and get nothing from it, while if I use other option like "-outfmt 6" I can get results in minutes, so it's likely stream is not the issue.

And I tried to use a small query against nr database, or keep the same query but against small local database, both gives me results, so I'm not sure whether it's the bulky size of both query and database and the format that cause the problems, thanks very much!

EDIT2:

thanks~ My doubt about the xml format is that it waits until everything finishes and then print out the results, since it has tags from the start and end, since my files are too big that they can't finish in one day, which is the time quote for our single job on the server, and as a result it print out nothing.

Any suggestions will be appreciated.

EDIT3:

Guys, thanks all for the suggestions, finally I decide to adopt the strategy that firstly I use "-outfmt 11" to generate standard asn file, and then use blast_formatter which is given by NCBI, to convert the result into xml format. I tested this on small sample result, it works fine, hope it works for my whole data set.

blast • 3.4k views
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by jinzhangut20100
2

Have you tried running it with "-out t.xml" instead of "> t.xml"?

ADD REPLYlink written 7.0 years ago by Philipp Bayer6.2k

Philipp is right, you should use -out instead of redirection. If you need detailed command option, please type "blastx -help"

ADD REPLYlink written 7.0 years ago by GAO Yang250

First things first : is your data made of NGS reads? If so, see my previous comment.

If not, I don't think your BLAST search will be finished in 24 hours, so independently of the output format, you should talk with your system administrator on this job time limit.

Finally, concerning XML format and other output formats, I don't know how XML output is written to the disk but I am 100% positive that outher outputs are kept in RAM and then written in batches. BLAST doesn't constantly stream output and it does not also write all the output in one go. This means, you should double check that your query is complete in both XML and other outputs, as having some info in the output file is not synonym of a completed query.

ADD REPLYlink written 7.0 years ago by Leonor Palmeira3.7k
0
gravatar for jinzhangut2010
7.0 years ago by
jinzhangut20100 wrote:

Thanks, I tried -out, actually I found that it works in either way, ">" or "-out", the problem is it only works for small size of queries, I can use this to generate xml format for 10 or 50 sequences, but when I tried to do this on 100,000 sequences, it print nothing to the file.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by jinzhangut20100
2

I am pretty sure blast doesn't constantly stream out the results. I've ran blast on large files before and have noticed it can take a few hours before I actually see anything in the output.

ADD REPLYlink written 7.0 years ago by Damian Kao15k

Are you sure you waited until BLAST finished with the whole query? I have already run very large queries without any problem, it just takes a lot of time... depending on the size of your query sequences, possibly days.

Are these 100,000 sequences reads from NGS sequencing? If so, there are much more appropriate tools to map these sequences such as bwa, bowtie, ... If you could tell us more on what you are trying to achieve, we might be able to help you find an appropriate tool.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Leonor Palmeira3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour