Question: How To Automatically Screen Thousands Of Sequences Using Vecscreen
0
gravatar for jsporter
5.7 years ago by
jsporter60
jsporter60 wrote:

I looked at the NCBI Vecscreen website (http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html), and you can put multiple sequences in fasta format in at a time. It allows you to download results, but there are several blast "hits" in the results. The downloaded results are not in the same format as displayed on the website where the website indicates a section of the sequence that has strong, medium, etc. similarity to a vector. Is there a way to download the results? Particularly the sections of the sequences that are possibly contaminated.

I'm really looking for a way to automate screening hundreds of thousands of sequences for vector contamination (and then cutting the sequences to remove the contamination.)

Any help is appreciated.

ncbi fasta sequence blast • 3.5k views
ADD COMMENTlink modified 2.2 years ago by Biostar ♦♦ 20 • written 5.7 years ago by jsporter60
3
gravatar for jsporter
5.7 years ago by
jsporter60
jsporter60 wrote:

Okay, I got vecscreen to work. The problem was that the app wasn't included in the FTP files that I downloaded from NCBI. I used subversion to get all the code and was able to find and build vecscreen. The text output can be used to clean the sequences. This could be done with Python and BioPython.

Here is what I finally did to get vecscreen to compile. As I mentioned, for some reason it wasn't in the tarball from the FTP site, so I had to check out with subversion (svn) NCBI toolbox users manual for building: http://www.ncbi.nlm.nih.gov/books/NBK7167/

With Linux ... Make sure G++ is installed (could be different for different platforms), etc. Use the following command to get the source: svn co http://anonsvn.ncbi.nlm.nih.gov/repos/v1/trunk/c++ From the compilers directory, do ./GCC.sh (this is different for different platforms) This step could be unnecessary From the top-level directory of the checked out files, do ./configure --with-flat-makefile cd GCC444-Debug/build make -f Makefile.flat $PROJECT_NAME (i.e. app/vecscreen/) Also need app/blast/ and app/blastdb/ Downloaded UniVec_Core fasta file from ftp://ftp.ncbi.nih.gov/pub/UniVec/ (This has only non-mammalian vectors) Make local copy of UniVec_Core database in the GCC444-Debug/bin directory with the command: ./makeblastdb -in UniVec_Core -dbtype nucl -out UniVec_Core.db Use the vecscreen command (found in GCC444-Debug/bin/) ./vecscreen -db UniVec_Core.db -query $fasta_file -out $vecscreen_outfile -outfmt 0 -text_output

ADD COMMENTlink written 5.7 years ago by jsporter60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1226 users visited in the last hour