Questions about supplementary file contents of a GEO dataset
1
0
Entering edit mode
16 months ago
Josh ▴ 20

Hi all, I am interested in the supplementary files from the GEO dataset: GSE145840.

However, I have doubts about the content of the files.

  1. Do those files contain the raw count information in tables?

  2. From that GEO dataset I mentioned, the tables in the supplementary files do not contain column names. From your experience, do you know what each column corresponds to? (the first one is the name of a gene, but the second and third ones what do they refer to?). Or do you know where in the dataset might be something that would give me information about those columns? For example, here I show you 2 supplementary files where the number of columns are different:

Col1 Col2
4933401J01Rik 0
Gm26206 0
Xkr4 4
Col1 Col2 Col3
4933401J01Rik 1070 0
Gm26206 110 0
Xkr4 6094 1
  • Thanks for your time and help :)
GEO • 832 views
ADD COMMENT
1
Entering edit mode

I recommend you to use the raw data from here for your analysis

ADD REPLY
3
Entering edit mode
16 months ago
seidel 11k

I agree It's frustrating that there's no direct description of the supplementary files, but looking at the record it simply says STAR alignment, featureCount, gene-level extraction. The three column files have: geneID geneLength readCount, while the two column files simply have geneID readCount. The two column files also have 91 more genes counted than the three column files (for the few I looked at). I'd guess they may have been aligned by different people, or using slightly different transcriptome versions. Or maybe whoever assembled the submission changed their process while putting the submission together (including or excluding the alignment target length from the file).

This is a good case for simply downloading the raw data and aligning and counting it yourself - especially if you have to analyze the entire data set, given that all the files don't have counts for all the same genes. On the other hand, for a quick analysis you could extract out the counts for the common gene set, but do some QC to make sure the data sets are comparable.

ADD COMMENT
0
Entering edit mode

Thanks for your answer and tips!

ADD REPLY

Login before adding your answer.

Traffic: 1671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6