Question: NCBI GEO/GSM/SRA conventions: Is raw data a part of the SRA or a supplementary file?
gravatar for abe
2.2 years ago by
abe10 wrote:

I downloaded about 30 RNA-seq fastq files via SRA from multiple studies found on NCBI GEO. FastQC identified adapters for about half of the samples, but no adapters were identified for the other half. I checked the over-represented sequences section of FastQC for the missing adapters, but in many cases there are no over-represented sequences, i.e. the reads seem to be trimmed.

After some digging I noted that for some of these samples (e.g. SRR1233685) the GSE states "Raw data are available in SRA" (GSE56742), while the GSM record states "Raw data provided as supplementary file Processed data is available on Series record" (GSM1367785). I checked this (percieved) inconsistency out for some of the reads that did have adapters, but the inconsistency appears for some of them as well (e.g. GSE66556/GSM1624972).

Simply put, I don't get it. This seems to be contradictory, no? The series record (GSE) instructs to get the raw data from the SRA, but the sample record (GSM) says to get the raw data as supplementary file. What am I misunderstanding?

Thanks for any clarification.

Edited to add link to GSM.

rna-seq next-gen • 1.4k views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by abe10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1722 users visited in the last hour