How to run Blastn on an input data with manual database?
1
0
Entering edit mode
3.1 years ago
mthm ▴ 50

I have ncbi-blast-2.11.0+ installed on ubuntu. I have an input TE fasta file and a manual curated TE database in fasta format and I want to blast my input data on the manual database with 80:80 rule (80% match on 80% of the length of the sequence), how should I do that?

ncbi+ blast • 1.5k views
ADD COMMENT
1
Entering edit mode
3.1 years ago
AfinaM ▴ 30

You will need to set up your manual curated database using makeblastdb and used the parameter available in blastn command line. Check out their manual if you haven't.

ADD COMMENT
0
Entering edit mode

based on what I understood from the documentation, I tried:

./bin/makeblastdb -in Drosophila.fa -dbtype nucl -parse_seqids  -out Drosophila
.
.   
.
FASTA-Reader: Ignoring invalid residues at position(s): On line 1858553: 2-6, 8, 11, 13-15
FASTA-Reader: Ignoring invalid residues at position(s): On line 1858554: 3, 6, 10-12, 15, 18, 21, 24, 27
FASTA-Reader: Ignoring invalid residues at position(s): On line 1858555: 3, 6, 9-12, 14, 20, 23, 26, 29-30
FASTA-Reader: Ignoring invalid residues at position(s): On line 1858556: 3, 9-11, 13-17, 21-23
FASTA-Reader: Ignoring invalid residues at position(s): On line 1858557: 3, 6-8, 12, 14-16, 18, 20-21


BLAST Database creation error: Defline lacks a proper ID around line 1858558

in my case I have merged these two fasta files plus the RepBase TE curated library to build one database, however, the latter one is TE sequences with their headers but the first two are totally different! I don't know if it could work like this or if I have to run two type of blasting on the two type of databases?

ADD REPLY
0
Entering edit mode

how did you merge those two files? did you open them in windows/dos ? If so, try to run dos2unix on the merged file before making the blastDB.

ADD REPLY
0
Entering edit mode

I am using command line in ubuntu, I just simply used cat but probably that is not the correct way given the two databases are different in content

ADD REPLY
0
Entering edit mode

cat should be OK for merging the two fasta files.

Can you check if the two files are both in the correct fasta format (header lines with >) and do not have weird chars in them? or post a small extract of those two files so we can have a look?

they are both nucleotide files right?

ADD REPLY
0
Entering edit mode

no in the first database, the beginning of the file is not sequences, but a script to fetch some data from the links. then after a long script, the sequences are like this:


<html lang="en">
  <head>
    <meta charset="utf-8">
  <link rel="dns-prefetch" href="&lt;a href=" https:="" github.githubassets.com"="" rel="nofollow">https://github.githubassets.com">
  <link rel="dns-prefetch" href="&lt;a href=" https:="" avatars.githubusercontent.com"="" rel="nofollow">https://avatars.githubusercontent.com">
  <link rel="dns-prefetch" href="&lt;a href=" https:="" github-cloud.s3.amazonaws.com"="" rel="nofollow">https://github-cloud.s3.amazonaws.com">
  <link rel="dns-prefetch" href="&lt;a href=" https:="" user-images.githubusercontent.com="" "="" rel="nofollow">https://user-images.githubusercontent.com/">
.
.
.
 
        ACAACGGAGCAGTATCAAGAGACGCTTGATAGCCTAGAGACACCTCTGCAAATGTCACTA
      
      
        
        CCCATTAAGCCCATCAGGGTTGAGGAAATTGTCGAAGCTATCAAATCTCTTCCGTTAAAG
      
      
.
.
.

that is however, the second dataset:

>5S:ClassI:SINE:5S
GTCTACTGCCATACCACCCTGAACACGCCCGATCTCATCTGATCTTGGAAGCTAAGCAGG
GTCAGGCCTGGTTGGTACCTGATGGGAGAGAGCCTGGGAACACCGGGTTCTGTAGGGTTG
>5S-Sauria:ClassI:SINE:5S
GCCTACGGCCATACCACCCTGAACACGCCCGATCTCGTCTGATCTCGGAAGCTAAGCAGG
GTCGGGCCTGGTTAGTACTTGGATGGGAGACCGCCTGGGAATACCGGGTGCTGTAGGCTT
TAGCCCCAGCTTCTGCCAACCTAGCAGTTCGAAAACATGCAAATGTGAGTAGATCAATAG
GTACCGCTCCGGCGGGAAGGTAACGGCGCTCCATGCAGTCATGCCGGCCACATGACCTTG
GAGGTGTCTACGGACAACGCCGGCTCTTCGGCTTAGAAATGGAGATGAGCACCAACCCCC
AGAGTCGGACATGACTGGACTTAATGTCAGGGGAAAACCTTTACCTTT
>5S_CPo:ClassI:SINE:5S
GTCTACGGCCATACCACCCTGAACGCGCCCGATCTCGTCTGATCTCGGAAGCTAAGCAGG
GTCGGGCCTGGTTAGTACTTGGATGGGAGACCGCCTGGGAATACCGGGTGCTGTAGGCTT
TAAAAAAAAAAAAAAAAA
>5S_DM:ClassI:SINE:5S
GCCAACGACCATACCACGCTGAATACATCGGTTCTCGTCCGATCACCGAAATTAAGCAGC
GTCGGGCGCGGTTAGTACTTAGATGGGGGACCGCTTGGGAACACCGCGTGTTGTTGGCCT
CGTCCACAACTTTTT
ADD REPLY
1
Entering edit mode

well, that's clearly NOT fasta file format.

from the link you posted you can get two fasta files I noticed. Formatting a blast DB works with fasta file input (default options).

See to get fasta format of those files and repeat the process I would suggest

ADD REPLY
0
Entering edit mode

yeah, right. I don't know how I managed to download a different file than the actual fasta!

ADD REPLY

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6