NCBI GEO/GSM/SRA conventions: Is raw data a part of the SRA or a supplementary file?
0
0
Entering edit mode
5.8 years ago
abe ▴ 30

I downloaded about 30 RNA-seq fastq files via SRA from multiple studies found on NCBI GEO. FastQC identified adapters for about half of the samples, but no adapters were identified for the other half. I checked the over-represented sequences section of FastQC for the missing adapters, but in many cases there are no over-represented sequences, i.e. the reads seem to be trimmed.

After some digging I noted that for some of these samples (e.g. SRR1233685) the GSE states "Raw data are available in SRA" (GSE56742), while the GSM record states "Raw data provided as supplementary file Processed data is available on Series record" (GSM1367785). I checked this (percieved) inconsistency out for some of the reads that did have adapters, but the inconsistency appears for some of them as well (e.g. GSE66556/GSM1624972).

Simply put, I don't get it. This seems to be contradictory, no? The series record (GSE) instructs to get the raw data from the SRA, but the sample record (GSM) says to get the raw data as supplementary file. What am I misunderstanding?

Thanks for any clarification.

Edited to add link to GSM.

RNA-Seq next-gen • 2.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6