I have a question about the blast. I admit that I do not understand everything.
I have been asked to blastx an fsa file of arabidopsis thaliana sequences against an oak gene model. In order to see if there were any matching sequences between the two species:
My data is formatted like this:
>Qrob_P00010.2 69 ATGTCTGGCCCTGAAAA........
Fasta file arabidopsis:
>AT3G25210.1 | Symbols: | Tetratricopeptide repeat (TPR)-like superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140 ATGTCGGCGACACTCCGACGCCTCATTCTTCTCACC..............
When I wanted to do a blastx, I first made my reference a protein database with these commands:
"makeblastdb -in ref.fsa -dbtype prot -blastdb_version 5 -parse_seqids" "blastx -query fasta_arabido.fsa -db ref.fsa -out ara.txt "
However when I do my blastx, I get no hits.
Query= AT3G25210.1 | Symbols: | Tetratricopeptide repeat (TPR)-like superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140 Length=1140 ***** No hits found ***** Lambda K H a alpha 0.318 0.134 0.401 0.792 4.96
I saw that I could try to make a tblastx but by making my reference a nucleic database and I get the results below which seems correct.
"makeblastdb -in ref.fsa -dbtype nucl -blastdb_version 5 -parse_seqids" "tblastx -query fasta_arabido.fsa -db ref.fsa -out ara.txt "
Query= AT3G25210.1 | Symbols: | Tetratricopeptide repeat (TPR)-like superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140 Length=1140 Score E Sequences producing significant alignments: (Bits) Value N Qrob_P0702440.2 1323 514 9e-146 1 >Qrob_P0702440.2 1323 Length=1323 Score = 514 bits (1116), Expect = 9e-146 Identities = 204/312 (65%), Positives = 259/312 (83%), Gaps = 0/312 (0%) Frame = +1/+1 Query 157 RTRTPLETQFETWIQNLKPGFTNSDVVIALRAQSDPDLALDIFRWTAQQRGYKHNHEAYH 336 R++T LETQFETW+QNLKPGFT SDV L +QSDPDLALD+FRWT QRGY H H Y Sbjct 190 RSKTQLETQFETWVQNLKPGFTPSDVEHTLWSQSDPDLALDLFRWTTLQRGYTHTHATYF 369 Query 337 TMIKQAITGKRNNFVETLIEEVIAGACEMSVPLYNCIIRFCCGRKFLFNRAFDVYNKMLR 516 T+IK ++ KR ETLIEEV++GAC++++PLYN II+FCC ++ LFNRAFDVY KM Sbjct 370 TIIKILVSNKRYGLAETLIEEVLSGACDINIPLYNYIIKFCCDKRSLFNRAFDVYKKMYN 549 Query 517 SDDSKPDLETYTlllssllKRFNKLNVCYVYLHAVRSLTKQMKSNGVIPDTFVLNMIIKA 696 S++ KP+L+TY++L + LL+RFNKLNVCY+YL + +SL+KQMK+ GVIPDT+VLNMIIKA Sbjct 550 SENCKPNLQTYSMLFNLLLRRFNKLNVCYMYLQSAKSLSKQMKAAGVIPDTYVLNMIIKA 729 Query 697 YAKCLEVDEAIRVFKEMALYGSEPNAYTYSYLVKGVCEKGRVGQGLGFYKEMQVKGMVPN 876 Y+KCLEVDEAIRVF+EM LYG EPNAYTY Y+VKG+CEKGRVGQG GFY+EM+ KG+VP+ Sbjct 730 YSKCLEVDEAIRVFREMGLYGCEPNAYTYGYMVKGLCEKGRVGQGFGFYEEMKGKGLVPS 909 Query 877 GSCYMVLICSLSMERRLDEAVEVVYDMLANSLSPDMLTYNTVLTELCRGGRGSEALEMVE 1056 S YM+LICSL++ERR ++A+ VV+DML N + PD+LTY T+L LCR GRG+EA E+++ Sbjct 910 SSSYMILICSLALERRFEDAIGVVFDMLGNFMGPDLLTYKTLLEGLCREGRGNEAFELLD 1089 Query 1057 EWKKRDPVMGER 1092 E +KRD M E+ Sbjct 1090 ELRKRDRSMSEK 1125
I don't understand the real difference between the blastx and the tblastx one is based on a protein database and the other on a nucleic database but is it because my reference file is in nucleotide I could not make a protein database?
Did I do the right thing according to you?
Thank you in advance for your answer
Have a nice day
Thank you very much, it's really clear now !