Question: Blastp against (only) human Sars-cov1 : which taxid?
0
gravatar for guillaume.rbt
8 days ago by
guillaume.rbt790
France
guillaume.rbt790 wrote:

Hi all,

I'm trying to blast peptides against human Sars-cov1 with blastp.

I'm not quite sure which taxid I should use to keep only results from Sars-cov1. When I search the term "sars" in the organism field, it proposes "HCoV-SARS (taxid:694009)", but this taxid includes a lot of different coronaviruses strains, including the sars-cov2 and bat strains, which I don't want to keep. Is there a reference strain with a taxid for sars-cov1 that I could use?

Any advices would be very useful !

Thanks

blastp • 94 views
ADD COMMENTlink modified 8 days ago • written 8 days ago by guillaume.rbt790
1

I think you have the correct taxid, you can double check on this by going to Taxonomy section of NCBI and putting in this id and it will return the following : https://www.ncbi.nlm.nih.gov/taxonomy/?term=694009. Have you tried doing the following: when you enter organism taxid in blastp settings, click the add button to enter info on another organism and enter the taxid of sars-cov2 in the textarea below and tick the exclude option?

ADD REPLYlink written 8 days ago by manaswwm70

Thank for your help. I've indeed checked this taxid on the taxonomy section of NCBI. It includes sequences from a lot of strains, including Bat coronavirus strains, and I would like to focus on human strains. Even when I exclude sars-cov2 I will still have all the bat strains (if I understand correctly) ?

ADD REPLYlink written 8 days ago by guillaume.rbt790
1
gravatar for genomax
8 days ago by
genomax83k
United States
genomax83k wrote:

I would say 333387 entry should be the correct one.

RefSeq genome for original SARS virus seems to cross-reference top level taxID that you have in your post so you may just want to use this genome.

ADD COMMENTlink modified 8 days ago • written 8 days ago by genomax83k

Thank for your input. Is 333387 a bat strain? I would like to blast againt a human sars-cov1 strain.

ADD REPLYlink written 8 days ago by guillaume.rbt790
1

Then use the RefSeq reference I linked above. Unfortunately it seems to be given a taxID of 694009 which is the generic ID for SARS like corona viruses. As for the bat/human issue the strain originated in bats correct? Human coronaviruses come in many categories.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax83k

Yes it seems to come from bats, but if I could I would like to focus on human strains. I'm a bit confused by the fact that the taxid is the generic ID for SARS like coronaviruses, which seems to include bat strains (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=694009). But at the same time, it is described here as human coronavirus. If I blast againts this taxid, will it use the 50,013 protein sequences referenced? And will those sequences include those from bat strains?

ADD REPLYlink modified 8 days ago • written 8 days ago by guillaume.rbt790
1

If you want to be strict then use these 14 proteins from the RefSeq genome linked above.

If I blast againts this taxid, will it use the 50,013 protein sequences referenced?

It probably will since you are limiting at taxID level.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax83k

Ok thanks! FYI I tried to blast with a sequence coming from a bat strain, specifyin the 694009 taxid, and I didn't get a perfect match. The best match being a sequence from the human reference. Which lead me to think that sequences from bat strains are not included in the blast search.

ADD REPLYlink written 8 days ago by guillaume.rbt790

That is curious since the top few entries seem to be from Bats for that taxID.

Severe acute respiratory syndrome-related coronavirus     Click on organism name to get more information.

    Bat coronavirus Cp/Yunnan2011   
    Bat coronavirus RaTG13   
    Bat coronavirus Rp/Shaanxi2011   
    Bat SARS coronavirus HKU3   
        Bat SARS coronavirus HKU3-1   
        Bat SARS coronavirus HKU3-10
ADD REPLYlink modified 8 days ago • written 8 days ago by genomax83k

I think I understand, I was using refseq and not nr as database, and it didn't include the bat strains sequences.

ADD REPLYlink modified 8 days ago • written 8 days ago by guillaume.rbt790
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour