How to retrieve sample informations from given ID from Sequence Read Archives?
1
2
Entering edit mode
7 months ago
DareDevil ★ 4.3k

I have a list of SRA id (around 1000) from NCBI SRA database.

"SRX1067067" ,"SRX022566", "SRX11222414", "SRX11222415", "SRX11222416", "SRX11222417", "SRX11222418", "SRX11222419", "SRX176057", "SRX176058"

I want to extract the information of all sample ids as follows:

output

eutils SRA • 402 views
ADD COMMENT
4
Entering edit mode
7 months ago
svp ▴ 680
import requests
import xml.etree.ElementTree as ET
import pandas as pd

# List of SRX accessions
srx_accessions = ["SRX1067067", "SRX022566", "SRX11222414", "SRX11222415"]

# Initialize an empty DataFrame
df = pd.DataFrame(columns=["ID", "Study Title", "Experiment Title"])

# Loop through each SRX accession
for srx_accession in srx_accessions:
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id={srx_accession}&retmode=xml"
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        xml_data = response.text
        root = ET.fromstring(xml_data)

        # Find the STUDY_TITLE and TITLE elements
        study_title = root.find(".//STUDY_TITLE").text
        title = root.find(".//TITLE").text

        # Append data to the DataFrame
        data = {"ID": [srx_accession], "Study Title": [study_title], "Experiment Title": [title]}
        temp_df = pd.DataFrame(data)
        df = pd.concat([df, temp_df], ignore_index=True)
    else:
        print(f"Failed to retrieve data for {srx_accession}. Status code: {response.status_code}")

# Write the DataFrame to a local CSV file
df.to_csv("srx_info.csv", index=False)

# Display the DataFrame
print(df)
ADD COMMENT

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6