hey guys,
i have downloaded uniprot.fasta, now i want to blast the protein sequences with my transcripts.
uniprot.fasta file format:
kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot.fasta >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD SFRKIYTDLGWKFTPL
my query fasta file format:
kurban@kurban-X550VC:~/Desktop/Uniprot$ more truncated_cd-hit-est-Trinity_CD_and_CK.fasta >TR1|c0_g1_i1 TAAGAGGTAAGAAAGCTAGAAAAGAGGAAATATTTTTAATAAAAATAATAAAACTTAATA ATATAATAATAAGTATCTTTTTATAATATTATAATAAATAAAATAAGGTAGAAATTATAT AAATTTATAAGAAAGTAATATTCTTATAATAAGAATTAACTTTTATTAATATTAAACTAG CTAAAGTAAAAATATAAATTTAAAAAAAAGATAATAATAATAAAGATTTTAAAAAATA
and i have done blast:
blastx -db uniprot_sprot.fasta -query truncated_cd-hit-est-Trinity_CD_and_CK.fasta -out uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular -evalue 1e-5 -num_threads 3 -num_alignments 1 -outfmt 6
the output file form i got:
kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular TR4|c0_g1_i1 sp|Q9WVJ0|KCNH3_MOUSE 76.54 81 19 0 243 1 2 82 8e-40 144 TR21|c0_g1_i1 sp|Q99315|YG31B_YEAST 34.09 88 58 0 1 264 708 795 2e-06 49.3 TR22|c0_g1_i1 sp|Q06559|RS3_DROME 62.67 75 28 0 2 226 146 220 3e-28 107 TR51|c0_g1_i1 sp|Q9M4T8|PSA5_SOYBN 50.00 78 38 1 239 6 40 116 1e-21 89.4 TR52|c0_g1_i1 sp|Q9UBS5|GABR1_HUMAN 50.00 102 36 4 3 299 377 466 8e-24 99.8 TR70|c0_g1_i1 sp|Q9H5L6|THAP9_HUMAN 31.36 169 108 5 499 2 322 485 5e-17 82.8 TR72|c0_g1_i1 sp|Q13200|PSMD2_HUMAN 51.95 77 37 0 1 231 666 742 5e-20 88.2 TR81|c0_g1_i1 sp|Q12296|MAM3_YEAST 32.00 125 82 2 3 374 204 326 3e-14 73.9 TR82|c0_g1_i1 sp|Q6BSS8|APTH1_DEBHA 50.68 73 34 2 20 235 161 232 4e-16 73.9 TR84|c0_g1_i1 sp|P20825|POL2_DROME 54.17 72 33 0 6 221 300 371 4e-20 88.2 TR97|c0_g1_i1 sp|Q921I9|EXOS4_MOUSE 36.67 90 55 2 280 14 101 189 4e-10 58.2
there is no protein information included in second column in the output file. if i could get the blasted sequences all header info. or protein information included in the second column would be awesome . the blast output file form i want to get might be look like this :
TR4|c0_g1_i1 sp|Q9WVJ0|KCNH3_MOUSE Uncharacterized protein 009R 76.54 81 19 0 243 1 2 82 8e-40 144 TR21|c0_g1_i1 sp|Q99315|YG31B_YEAST Uncharacterized protein 042L 34.09 88 58 0 1 264 708 795 2e-06 49.3
or something looks like that .
could you give me some suggestions? how could i do that?