Bioinformatics
1
1
Entering edit mode
2.8 years ago
m90 ▴ 30

to find a suitable paper with publicly available RNAseq dataset. My recommendations are:

  1. Human or mouse data
  2. Cancer phenotype
  3. SE or PE RNAseq with read length 100 or 150 bp
  4. 5 replicates per group
  5. 25-50 million read per sample How can i know number of replicates?!
tag1 tag2 • 1.8k views
ADD COMMENT
1
Entering edit mode

I'm curious also. I never figured out how to select studies with a specific number of replicates on the SRA.

OP, to start with, a query like this (for the SRA) might help you: (((((((Homo sapiens[Organism]) AND "transcriptomic"[Source]) AND "rna seq"[Strategy]) AND "paired"[Layout]) AND "polya"[Selection]) AND "00000000150"[ReadLength]) OR "00000000100"[ReadLength]) AND cancer[Text Word].

ADD REPLY
2
Entering edit mode

You must be a SQL expert :-) Unfortunately that query leads one to this not very helpful result.

Mariam you are going to need to dig through papers or datasets on SRA to get datasets that fit your needs. There are no simple answers that you can expect from a question like this.

ADD REPLY
0
Entering edit mode

GenoMax I don't quite follow. Are you referring to the fact that the query yields zero links to studies?

ADD REPLY
1
Entering edit mode

Actually the opposite. As of today your query leads to a large number of studies with no easy way for a new user to pick something they can use.

                    Public access   Controlled access   All
SRA Experiments     60416           7404                67820
SRA Studies         1279             54                 1332
ADD REPLY
1
Entering edit mode

Using inspiration from Dunois query you can extract information about some of these projects using EntrezDirect:

esearch -db bioproject -query "Homo sapiens [ORGN] AND transcriptomic [Source]) AND cancer" | elink -target sra | efetch -format runinfo > results.out

results.out will have information that looks like this

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR14730314,2021-06-06 18:20:20,2021-06-04 11:39:43,5605744,337859106,0,60,77,,https://sra-download.ncbi.nlm.nih.gov/traces/sra27/SRR/014385/SRR14730314,SRX11067323,,RNA-Seq,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,NextSeq 500,SRP322711,PRJNA735135,,735135,SRS9133846,SAMN19570095,simple,9606,Homo sapiens,GSM5357026,,,,,,,no,,,,,GEO,SRA1240710,,public,56E44264CBB516C204D2581A5CFB5B52,E27CCAC989CF960F546E35F1508EE337
SRR14730315,2021-06-06 18:20:20,2021-06-04 11:39:56,6008406,363408627,0,60,86,,https://sra-download.ncbi.nlm.nih.gov/traces/sra36/SRR/014385/SRR14730315,SRX11067324,,RNA-Seq,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,NextSeq 500,SRP322711,PRJNA735135,,735135,SRS9133847,SAMN19570094,simple,9606,Homo sapiens,GSM5357027,,,,,,,no,,,,,GEO,SRA1240710,,public,E562A592B32C3CABD477D6D35BAF8187,2AFB460AAAECECD7263C688B77501F65

You can do some sorting to see if you are able to find studies with multiple samples. SRA does not track replicate information so it will still require you to look into details of the studies.

ADD REPLY
0
Entering edit mode

You don't need to delete the post. Just edit it and write a more informative title. Your tags also need work too.

ADD REPLY
1
Entering edit mode

It is my first time to post here so, i dont have more information about how to write ideal post.

ADD REPLY
3
Entering edit mode
ADD REPLY
0
Entering edit mode

But surely you read a bunch of posts before posting here, and observed what kind of posts generated good answers?

ADD REPLY
0
Entering edit mode

Now you've been provided with the necessary grounding, please edit the title and tags.

ADD REPLY
1
Entering edit mode
2.8 years ago

I would start with MetaSRA https://metasra.biostat.wisc.edu/ and maybe think about the specific type of cancer and cell type

Newer papers are more likely to have the read length and depth you need. I just noticed there are no dates in MetaSRA which is annoying but the API will reveal more SRA metadata.

ADD COMMENT
0
Entering edit mode

thanks a lot

ADD REPLY
0
Entering edit mode

np. now change your title please.

ADD REPLY

Login before adding your answer.

Traffic: 2470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6