Question: Which column to use from STAR output for Differential Gene Expression Analysis?
0
gravatar for caggtaagtat
14 months ago by
caggtaagtat600
caggtaagtat600 wrote:

Hi,

during mapping with STAR I generated my count matrix for the differential gene expression analysis with the parameter --quantMode GeneCounts, however, I don't now which column from the output I should use for the analysis.

The help site says, it depends on how my data is stranded, however I'm not sure how to determine that in my data.

In this google group, Mr. Dobin suggests to take the 4th column, if the read counts in in the 4th column are generally higher than in 3rd column, which is the case in my data. However he also states, that the 4th column represents the output from ht-seq with the parameter -s reverse, which is described in the manual of ht-seq like a setting only for paired end reads (I only have singel end reads).

For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

So should I now use the 4th column of the STAR output or the 2nd (nonstranded)?

rna-seq star • 1.1k views
ADD COMMENTlink modified 14 months ago by swbarnes25.5k • written 14 months ago by caggtaagtat600

The real question is what was the library prep procedure for your reads? The choice of column depends on the answer to this question.

Hint: for Illumina standard stranded kits, you should use the 4th column.

ADD REPLYlink written 14 months ago by h.mon25k

You're right, i was just wondering, if I have to try to get the "strand information" about the lib prep or if I can maybe see it in the read distribution. I will consider my data reverse stranded for now and also try to get to know, if my data is stranded at all :)

ADD REPLYlink written 14 months ago by caggtaagtat600

I just wanted to update, that I found the protocol for the RNA-Seq lib preperation and it was a stranded library :)

ADD REPLYlink written 14 months ago by caggtaagtat600
3
gravatar for swbarnes2
14 months ago by
swbarnes25.5k
United States
swbarnes25.5k wrote:

that the 4th column represents the output from ht-seq with the parameter -s reverse, which is described in the manual of ht-seq like a setting only for paired end reads

I don't think that's quite what the manual means. My lab almost never does paired end RNAseq, but I always pick the 4th column for counts, because the library prep we used makes all the reads align in the reverse.

ADD COMMENTlink written 14 months ago by swbarnes25.5k

Ok thanks, I will assume my data is reverse stranded for now and try to find out more information about the lib prep kit used for this RNA-Seq

ADD REPLYlink written 14 months ago by caggtaagtat600
2
gravatar for Bastien Hervé
14 months ago by
Bastien Hervé4.2k
Limoges, CBRS, France
Bastien Hervé4.2k wrote:

The strandness of your data depends on the protocol you used.

You can find the strandness information using RSeQC, module : infer_experiment.py.

That will tell you if your data are stranded or not.

If it is, you have two cases :

  • "++,--" has a majority, your data are first read forward
  • "+-,-+" has a majority, your data are first read reverse

( Related to this post : Infer-experiment.py, is strand-specific? and this RSeQC documentation : http://rseqc.sourceforge.net/#infer-experiment-py )

Related to this post A: Count the number of mapped and annotated reads in bam files (h.mon answer), you have 3 counts columns in your Star output:

If your data are unstranded you take the first one

If your data are stranded first read forward ("++,--" in RSeQC) you take the second one

If your data are stranded first read reverse ("+-,-+" in RSeQC) you take the third one

In any case if you know that your data are stranded but you don't know if first read is forward or reverse, just add all counts by column and check at the result. You will find a huge difference of total count in your columns. Take the bigger one.

( I'm not an expert, so if someone can confirm this, I would be thankful :) )

ADD COMMENTlink modified 14 months ago • written 14 months ago by Bastien Hervé4.2k

Thanks, I will give that tool a try than.

ADD REPLYlink written 14 months ago by caggtaagtat600
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1398 users visited in the last hour