Question: How to convert XP_002531646.1 to sp|P05661|MYSA_DROME in Diamond blastX output
0
gravatar for Farbod
2.4 years ago by
Farbod3.2k
Toronto
Farbod3.2k wrote:

Dear BioStars, Hi.

I have run blastx against NcBI nr using Diamond with tabular output and sensitive option

the result is as bellow:

TRINITY_DN212758_c0_g1_i1.......XP_002531646.1.....81.3 107 20 0 3 323 199 305 2.9e-41 176.4 TRINITY_DN212728_c0_g1_i1.....XP_014502021.1....89.2 37 4 0 3 113 403 439 8.6e-10 71.2 TRINITY_DN212793_c0_g1_i1.....XP_015200040.1.....91.8 61 5 0 665 483 238 298 9.7e-23 115.9

But I need something like this in the second row :

sp|P05661|MYSA_DROME

sp|Q7KRI2|LOLAL_DROME

sp|A1ZAU8|SSP4_DROME

.

Q: (1) Is there any converter tools for this task or if not, (2) which option I must add to my Diamond script ?

NOTE1: maybe it is the SwissProt IDs ?

NOTE2: my Diamond script:

diamond blastx -d nr -q Trinity_FM.fasta -o blastX-all-sesnitive.outfmt6 -f 6 -p 22 --evalue 0.000001 -k 1 --sensitive

NOTE3: these are the Diamond software tabular options

Value 6 may be followed by a space-separated list of these keywords:

qseqid means Query Seq - id
qlen means Query sequence length
sseqid means Subject Seq - id
sallseqid means All subject Seq - id(s), separated by a ';'
slen means Subject sequence length
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive - scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive - scoring matches
qframe means Query frame
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
qcovhsp means Query Coverage Per HSP

~ Best

blast • 1.0k views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Farbod3.2k

What database did you use for diamond? And what was the source of sequences used to populate it?

ADD REPLYlink written 2.4 years ago by Tonor420

Dear Tonor, Hi

I have used NCBI nr and the query is my transcriptome assembly fasta file.

I have seen a pie chart in this Nature paper (please have a look at Figure 2: Species percentages in BLASTX hits) and I have tried to create a similar chart for my data,

the authors mentioned that they have used NCBI database :"homology search between our contigs and the NCBI database".

But I guess they have used SwissProt !

What do you think ?

ADD REPLYlink written 2.4 years ago by Farbod3.2k

Does XP_002531646.1directly relate to sp|P05661|MYSA_DROME? Or is it just an example of format

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Tonor420

Hi, Sorry Tonor,

I had to mention that it is just an example.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Farbod3.2k

What is the full line currently for:

TRINITY_DN212758_c0_g1_i1.......XP_002531646.1.....81.3 107 20 0 3 323 199 305 2.9e-41 176.4

What are the ...? replacing

ADD REPLYlink written 2.4 years ago by Tonor420

Nothing! it is just for this fact that in Biostars screen the more space do not necessarily mean more spacing,

so I have used "dot" instead of "space".

It that line, the only part that is important is the second column which I has been shown in bold.

ADD REPLYlink written 2.4 years ago by Farbod3.2k

So at the moment you have:

Accession

But you want it:

sp|Accession|ProtName
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Tonor420

@Farbod wants to convert NCBI ID's to Swissprot ID.

@Farbod: You could use Uniprot ID converter or do the search again using uniprot database. If you are looking to do in place replacements then it may need a conversion of the ID's first and then replacement.

ADD REPLYlink written 2.4 years ago by genomax65k

It seems that there is no NCBI nr in "2-Select Options" . What must I select instead ?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Farbod3.2k

Use EMBL/GenBank/DDBJ You may also need to look in EMBL/GenBank/DDBJ CDS

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax65k

Hi,

I have select some of my NCBI nr IDs and have used EMBL/GenBank/DDBJ (and also CDS), but I have received "Sorry, no results were found" !

examples :

XP_015458527.1

XP_017541064.1

XP_012987512.1

XP_015682166.1

XP_017537366.1

XP_015457441.1

XP_015214365.1

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Farbod3.2k

XP_* records are computational predictions/submissions and they are not part of UniProt. You may need to query UniParc database to get information on those. Remove the version numbers at the end of the record when you query.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax65k

BTW: XP_002531646.1 seems to translate to B9T077 not P05661.

ADD REPLYlink written 2.4 years ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1228 users visited in the last hour