Question

blastx versus tblastx

0

Entering edit mode

3.3 years ago

aka ▴ 10

hello everyone

I have a question about the blast. I admit that I do not understand everything.

I have been asked to blastx an fsa file of arabidopsis thaliana sequences against an oak gene model. In order to see if there were any matching sequences between the two species:

My data is formatted like this:

Reference.fsa

>Qrob_P00010.2 69
ATGTCTGGCCCTGAAAA........

Fasta file arabidopsis:

>AT3G25210.1 | Symbols:  | Tetratricopeptide repeat (TPR)-like superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140
ATGTCGGCGACACTCCGACGCCTCATTCTTCTCACC..............

When I wanted to do a blastx, I first made my reference a protein database with these commands:

"makeblastdb -in ref.fsa  -dbtype prot -blastdb_version 5 -parse_seqids"
"blastx -query fasta_arabido.fsa -db ref.fsa -out ara.txt "

However when I do my blastx, I get no hits.

    Query= AT3G25210.1 | Symbols:  | Tetratricopeptide repeat (TPR)-like
superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140
Length=1140


***** No hits found *****

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96

I saw that I could try to make a tblastx but by making my reference a nucleic database and I get the results below which seems correct.

"makeblastdb -in ref.fsa  -dbtype nucl -blastdb_version 5 -parse_seqids"
"tblastx -query fasta_arabido.fsa -db ref.fsa -out ara.txt "

ara.txt

    Query= AT3G25210.1 | Symbols:  | Tetratricopeptide repeat (TPR)-like
superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140

Length=1140
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value   N

Qrob_P0702440.2 1323                                                  514     9e-146  1
>Qrob_P0702440.2 1323
Length=1323

 Score = 514 bits (1116),  Expect = 9e-146
 Identities = 204/312 (65%), Positives = 259/312 (83%), Gaps = 0/312 (0%)
 Frame = +1/+1

Query  157   RTRTPLETQFETWIQNLKPGFTNSDVVIALRAQSDPDLALDIFRWTAQQRGYKHNHEAYH  336
             R++T LETQFETW+QNLKPGFT SDV   L +QSDPDLALD+FRWT  QRGY H H  Y 
Sbjct  190   RSKTQLETQFETWVQNLKPGFTPSDVEHTLWSQSDPDLALDLFRWTTLQRGYTHTHATYF  369

Query  337   TMIKQAITGKRNNFVETLIEEVIAGACEMSVPLYNCIIRFCCGRKFLFNRAFDVYNKMLR  516
             T+IK  ++ KR    ETLIEEV++GAC++++PLYN II+FCC ++ LFNRAFDVY KM  
Sbjct  370   TIIKILVSNKRYGLAETLIEEVLSGACDINIPLYNYIIKFCCDKRSLFNRAFDVYKKMYN  549

Query  517   SDDSKPDLETYTlllssllKRFNKLNVCYVYLHAVRSLTKQMKSNGVIPDTFVLNMIIKA  696
             S++ KP+L+TY++L + LL+RFNKLNVCY+YL + +SL+KQMK+ GVIPDT+VLNMIIKA
Sbjct  550   SENCKPNLQTYSMLFNLLLRRFNKLNVCYMYLQSAKSLSKQMKAAGVIPDTYVLNMIIKA  729

Query  697   YAKCLEVDEAIRVFKEMALYGSEPNAYTYSYLVKGVCEKGRVGQGLGFYKEMQVKGMVPN  876
             Y+KCLEVDEAIRVF+EM LYG EPNAYTY Y+VKG+CEKGRVGQG GFY+EM+ KG+VP+
Sbjct  730   YSKCLEVDEAIRVFREMGLYGCEPNAYTYGYMVKGLCEKGRVGQGFGFYEEMKGKGLVPS  909

Query  877   GSCYMVLICSLSMERRLDEAVEVVYDMLANSLSPDMLTYNTVLTELCRGGRGSEALEMVE  1056
              S YM+LICSL++ERR ++A+ VV+DML N + PD+LTY T+L  LCR GRG+EA E+++
Sbjct  910   SSSYMILICSLALERRFEDAIGVVFDMLGNFMGPDLLTYKTLLEGLCREGRGNEAFELLD  1089

Query  1057  EWKKRDPVMGER  1092
             E +KRD  M E+
Sbjct  1090  ELRKRDRSMSEK  1125

I don't understand the real difference between the blastx and the tblastx one is based on a protein database and the other on a nucleic database but is it because my reference file is in nucleotide I could not make a protein database?

Did I do the right thing according to you?

Thank you in advance for your answer

Have a nice day

Aka

tblastx blastx • 1.7k views

ADD COMMENT • link 3.2 years ago by aka ▴ 10

score 2 · Accepted Answer · 2021-08-09

Both your sequences are nucleotides, so the most natural thing would be to use makeblastdb -dbtype nucl and compare them using blastn. However, you seem to want to compare them at the protein level. In that case blastx and makeblastdb -dbtype prot will not work, because you are wrongly formatting the database (as proteins, while it is nucleotide). Even though blastx is translating your nucleotides into protein, you don't have a protein database for comparison which is why it failed.

You have two options: use makeblastdb -dbtype nucl and translate both database and query sequences into proteins, which is what tblastx does and what ultimately worked for you. You could have manually translated ref.fsa into proteins and name it ref.faa, then do makeblastdb -dbtype prot, and after that blastx would work as well.

The difference is that blastx translates only the nucleotide query and expects a protein database at the other end, while tblastx expects query and database sequences to be nucleotides (that's what you have) and will translate them both.