I have a Total RNA TrueSeq Illumina Stranded Paired-end Ribo0 Library.
I'm using Star to get a matrice of reads count.
I dont know which column count from output to take into account to make differential gene expression analysis.
As they say here
Outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:
column 1: gene ID
column 2: counts for unstranded RNA-seq
column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes)
column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse)
Select the output according to the strandedness of your data.
Note, that if you have stranded data and choose one of the columns 3 or 4, the other column (4 or 3) will give you the count of antisense reads.
I launch rseqc infer_experiment.py to check library preparatin of my data.
It returns that fraction of reads were explained by the following combination 1+-,1-+,2++,2-- meaning :
read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
From what I understood, column4 (htseq-count -s reverse) seems to be the good count regarding the library. I have selected column 4. But I was wondering what about the column 3 (which describe antisense reads).
For differential gene expression analysis, which count I need to use ?
What to do with column 3 antisense reads ? Is it a kind of artefact ? or do I have to pull column 3 and 4 together ?
Thanks for your help.