Question

Do you use anti-sens reads for differential gene expression analysis with Stranded Paired-End Library ?

2

Entering edit mode

7.5 years ago

ZheFrench ▴ 570

I have a Total RNA TrueSeq Illumina Stranded Paired-end Ribo0 Library.

I'm using Star to get a matrice of reads count.

I dont know which column count from output to take into account to make differential gene expression analysis.

As they say here

Outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:

column 1: gene ID

column 2: counts for unstranded RNA-seq

column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes)

column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse)

Select the output according to the strandedness of your data.

Note, that if you have stranded data and choose one of the columns 3 or 4, the other column (4 or 3) will give you the count of antisense reads.

I launch rseqc infer_experiment.py to check library preparatin of my data.

It returns that fraction of reads were explained by the following combination 1+-,1-+,2++,2-- meaning :

read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand

read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand

read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand

read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand

From what I understood, column4 (htseq-count -s reverse) seems to be the good count regarding the library. I have selected column 4. But I was wondering what about the column 3 (which describe antisense reads).

For differential gene expression analysis, which count I need to use ?

What to do with column 3 antisense reads ? Is it a kind of artefact ? or do I have to pull column 3 and 4 together ?

Thanks for your help.

RNA-Seq gene expression DE reads count STAR • 2.9k views

ADD COMMENT • link updated 7.5 years ago by Devon Ryan 104k • written 7.5 years ago by ZheFrench ▴ 570

score 3 · Accepted Answer · 2016-10-06

3

Entering edit mode

7.5 years ago

Devon Ryan 104k

You want column 4. Column 3 will presumably give you funky results when you have overlapping genes. Don't add anything together, just directly use column 4.

ADD COMMENT • link 7.5 years ago by Devon Ryan 104k

0

Entering edit mode

I'm coming back to your answer , sorry for the delay but what about the anti-sense reads ? How are they produce ? I mean, in which step they occur in the library prep ? What is the phenomum causing their creation ? Do we have to take care of them or are they artefact ?

ADD REPLY • link 7.4 years ago by ZheFrench ▴ 570

1

Entering edit mode

Who knows, in some cases they might indicate low levels of actual transcription, in others just random artefacts.

ADD REPLY • link 7.4 years ago by Devon Ryan 104k

1

Entering edit mode

Antisense transcripts, biological noise, enhancer RNAs or artefacts from your library prep...

ADD REPLY • link 7.4 years ago by WouterDeCoster 47k