Question

Downloaded pdb's on rcsb.org

0

Entering edit mode

12 days ago

iamsmor • 0

Hello everyone

I am working on molecular docking and I want to download some pdb's as pdb format according to search (name of protein, name of organism) on rcsb.org. Can someone help me if there is a way to do it, how it can be done?

Thanks for any help

rcsb pdb • 1.0k views

ADD COMMENT • link updated 12 days ago by Ram 43k • written 12 days ago by iamsmor • 0

2

Entering edit mode

Download services for PDB are described on this page: https://www.rcsb.org/docs/programmatic-access/file-download-services

It could be as simple as grabbing a file with curl/wget using https://files.rcsb.org/view/4hhb.pdb as an example PDB accession.

ADD REPLY • link 12 days ago by GenoMax 141k

0

Entering edit mode

Thank you very much. Actually I looked at there, but actually I want to find something like according to search url like this QUERY: Gene Name = "AHR" AND Scientific Name of the Source Organism = "Homo sapiens" use something like bioython or I don't know made script for automating downloading process.

ADD REPLY • link updated 12 days ago by Ram 43k • written 12 days ago by iamsmor • 0

1

Entering edit mode

PDB has a search API: https://search.rcsb.org/#search-example-1

Here's the JSON from your search query:

{
    "query": {
        "type": "group",
        "nodes": [
            {
                "type": "group",
                "nodes": [
                    {
                        "type": "terminal",
                        "service": "text",
                        "parameters": {
                            "attribute": "rcsb_entity_source_organism.rcsb_gene_name.value",
                            "negation": false,
                            "operator": "exact_match",
                            "value": "AHR"
                        }
                    },
                    {
                        "type": "group",
                        "nodes": [
                            {
                                "type": "group",
                                "nodes": [
                                    {
                                        "type": "terminal",
                                        "service": "text",
                                        "parameters": {
                                            "attribute": "rcsb_entity_source_organism.ncbi_scientific_name",
                                            "value": "Homo%20sapiens",
                                            "operator": "exact_match"
                                        }
                                    }
                                ],
                                "logical_operator": "or",
                                "label": "rcsb_entity_source_organism.ncbi_scientific_name"
                            }
                        ],
                        "logical_operator": "and"
                    }
                ],
                "logical_operator": "and",
                "label": "text"
            }
        ],
        "logical_operator": "and"
    },
    "return_type": "entry",
    "request_options": {
        "paginate": {
            "start": 0,
            "rows": 25
        },
        "results_content_type": [
            "experimental"
        ],
        "sort": [
            {
                "sort_by": "score",
                "direction": "desc"
            }
        ],
        "scoring_strategy": "combined"
    },
    "request_info": {
        "query_id": "80f5cb00127713554e0dd5ce36ae71bd"
    }
}

Compare a JSON there and your example query to construct a custom JSON and use the API with that JSON.

ADD REPLY • link 12 days ago by Ram 43k

1

Entering edit mode

You can use the "Advanced query" builder (https://www.rcsb.org/search/advanced ) to create a query like:

https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.taxonomy_lineage.name%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Homo%20sapiens%22%7D%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.rcsb_gene_name.value%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22AHR%22%7D%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22label%22%3A%22text%22%7D%5D%7D%2C%22return_type%22%3A%22entry%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%2296ab84f1e1ba146fc2d50034b746143e%22%7D%7D

ADD REPLY • link 12 days ago by GenoMax 141k

1

Entering edit mode

That's how they seem to have written their query - automating that is a bit of a pain though as it takes a crazy JSON as input.

ADD REPLY • link 12 days ago by Ram 43k

1

Entering edit mode

For a non-programmer using the search builder link included above may be the best option. Even that is not very user friendly.

ADD REPLY • link 12 days ago by GenoMax 141k

score 2 · Answer 1 · 2024-04-18

I'm going to build off of OP's query and give them a simple script:

organism=$(echo $1 | sed 's/ /%20/g')
gene=$2

curl -s https://search.rcsb.org/rcsbsearch/v2/query\?json\=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.rcsb_gene_name.value%22%2C%22negation%22%3Afalse%2C%22operator%22%3A%22exact_match%22%2C%22value%22%3A%22$gene%22%7D%7D%2C%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.ncbi_scientific_name%22%2C%22value%22%3A%22$organism%22%2C%22operator%22%3A%22exact_match%22%7D%7D%5D%2C%22logical_operator%22%3A%22or%22%2C%22label%22%3A%22rcsb_entity_source_organism.ncbi_scientific_name%22%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22logical_operator%22%3A%22and%22%2C%22label%22%3A%22text%22%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%2C%22return_type%22%3A%22entry%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A250%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%2280f5cb00127713554e0dd5ce36ae71bd%22%7D%7D | grep identifier | cut -d: -f2 | tr -d ' ",'

Save it as get_my_data.bash and then run it as

bash get_my_data.bash "Homo sapiens" AHR

Remember to provide the species in double quotes as it is a multi-word argument.

Sample runs:

$ bash get_my_data.bash "Homo sapiens" AHR
5NJ8
5V0L
7ZUB
8QMO

$ bash get_my_data.bash "Homo sapiens" TP53
1DT7
1JSP
1KZY
1MA3
1XQH
1YC5
1YCQ
1YCR
2B3G
2FEJ
2FOJ
2FOO
2GS0
2H2D
2H2F
2H4F
2H4H
2H4J
2H59
2K8F
2LY4
2MEJ
2MZD
2PCX
2RUK
..
..
..