Question: NCBI GEO/GSM/SRA conventions: Is raw data a part of the SRA or a supplementary file?
0
gravatar for abe
3 months ago by
abe0
abe0 wrote:

I downloaded about 30 RNA-seq fastq files via SRA from multiple studies found on NCBI GEO. FastQC identified adapters for about half of the samples, but no adapters were identified for the other half. I checked the over-represented sequences section of FastQC for the missing adapters, but in many cases there are no over-represented sequences, i.e. the reads seem to be trimmed.

After some digging I noted that for some of these samples (e.g. SRR1233685) the GSE states "Raw data are available in SRA" (GSE56742), while the GSM record states "Raw data provided as supplementary file Processed data is available on Series record" (GSM1367785). I checked this (percieved) inconsistency out for some of the reads that did have adapters, but the inconsistency appears for some of them as well (e.g. GSE66556/GSM1624972).

Simply put, I don't get it. This seems to be contradictory, no? The series record (GSE) instructs to get the raw data from the SRA, but the sample record (GSM) says to get the raw data as supplementary file. What am I misunderstanding?

Thanks for any clarification.

Edited to add link to GSM.

rna-seq next-gen • 217 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by abe0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1692 users visited in the last hour