Question: after blast with uniprot.fasta file, how could i get the output file which included all blasted protein's all sequence header
0
gravatar for Kurban
3.5 years ago by
Kurban170
china/Urumqi/xinjiang academy of animal scinces
Kurban170 wrote:

hey guys,

i have downloaded uniprot.fasta, now i want to blast the protein sequences with my transcripts.

uniprot.fasta file format:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot.fasta
>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL

my query fasta file format:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more truncated_cd-hit-est-Trinity_CD_and_CK.fasta
>TR1|c0_g1_i1
TAAGAGGTAAGAAAGCTAGAAAAGAGGAAATATTTTTAATAAAAATAATAAAACTTAATA
ATATAATAATAAGTATCTTTTTATAATATTATAATAAATAAAATAAGGTAGAAATTATAT
AAATTTATAAGAAAGTAATATTCTTATAATAAGAATTAACTTTTATTAATATTAAACTAG
CTAAAGTAAAAATATAAATTTAAAAAAAAGATAATAATAATAAAGATTTTAAAAAATA

and i have done blast:

blastx -db uniprot_sprot.fasta -query truncated_cd-hit-est-Trinity_CD_and_CK.fasta -out uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular -evalue 1e-5 -num_threads 3 -num_alignments 1 -outfmt 6

the output file form i got:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular
TR4|c0_g1_i1    sp|Q9WVJ0|KCNH3_MOUSE    76.54    81    19    0    243    1    2    82    8e-40     144
TR21|c0_g1_i1    sp|Q99315|YG31B_YEAST    34.09    88    58    0    1    264    708    795    2e-06    49.3
TR22|c0_g1_i1    sp|Q06559|RS3_DROME    62.67    75    28    0    2    226    146    220    3e-28     107
TR51|c0_g1_i1    sp|Q9M4T8|PSA5_SOYBN    50.00    78    38    1    239    6    40    116    1e-21    89.4
TR52|c0_g1_i1    sp|Q9UBS5|GABR1_HUMAN    50.00    102    36    4    3    299    377    466    8e-24    99.8
TR70|c0_g1_i1    sp|Q9H5L6|THAP9_HUMAN    31.36    169    108    5    499    2    322    485    5e-17    82.8
TR72|c0_g1_i1    sp|Q13200|PSMD2_HUMAN    51.95    77    37    0    1    231    666    742    5e-20    88.2
TR81|c0_g1_i1    sp|Q12296|MAM3_YEAST    32.00    125    82    2    3    374    204    326    3e-14    73.9
TR82|c0_g1_i1    sp|Q6BSS8|APTH1_DEBHA    50.68    73    34    2    20    235    161    232    4e-16    73.9
TR84|c0_g1_i1    sp|P20825|POL2_DROME    54.17    72    33    0    6    221    300    371    4e-20    88.2
TR97|c0_g1_i1    sp|Q921I9|EXOS4_MOUSE    36.67    90    55    2    280    14    101    189    4e-10    58.2


there is no protein information  included in second column in the output file. if i could get the blasted sequences all header info. or protein information included in the second column would be awesome . the blast output file form i want to get might be look like this :

TR4|c0_g1_i1    sp|Q9WVJ0|KCNH3_MOUSE    Uncharacterized protein 009R 76.54    81    19    0    243    1    2    82    8e-40     144
TR21|c0_g1_i1    sp|Q99315|YG31B_YEAST    Uncharacterized protein 042L 34.09    88    58    0    1    264    708    795    2e-06    49.3

or something looks like that .

could you give me some suggestions? how could i do that?

blast • 1.3k views
ADD COMMENTlink written 3.5 years ago by Kurban170
0
gravatar for dschika
3.5 years ago by
dschika290
European Union
dschika290 wrote:

Have you had a look at the outfmt options? Check the formatting options with:

blastx -help

blastx ... -outfmt "6 qseqid sseqid sgi ..."
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by dschika290

I'm not sure sure makeblastedb can parse the info correctly from uniprot.fasta. One option would be to create a map file with two columns, "uniprot ID" (e.g. sp|Q9WVJ0|KCNH3_MOUSE) in first column and the other info OP wants in second column. Then OP could use join to join the blast output file based on column 2 and map file based on column 1 and output in his desired format.

ADD REPLYlink written 3.5 years ago by 5heikki8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1527 users visited in the last hour