Count antisense transcripts using STAR --quantMode
2
1
Entering edit mode
5.1 years ago

I have got some RNAseq data from Mus Musculus using TruSeq Stranded Total RNA in paired-end. I want to count my reads over my genes regarding to the gene orientation. As if a gene is plus strand, reads on forward strand falling into my plus gene area should be counted and if a gene is minus strand, reads on reverse strand falling into my minus gene area should be counted.

Let's take two examples from Ensembl :

I created my index with a gencode reference genome and I did my alignments using STAR with the --quantMode and an annotation file from gencode. I've got my _Aligned.sortedByCoord.out.bam and _ReadsPerGene.out.tab.

Check the strand of my 2 genes into my annotation file :

zgrep -i --color "Hmgb2" gencode.vM18.chr_patch_hapl_scaff_and_23_custom_igh_gff3sort.annotation.gtf.gz

#chr8   HAVANA  gene    57511907    57515999    .   +   .   gene_id "ENSMUSG00000054717.7"; gene_type "protein_coding"; gene_name "Hmgb2"; level 2; havana_gene "OTTMUSG00000060717.1";

zgrep -i --color "Ighm" gencode.vM18.chr_patch_hapl_scaff_and_23_custom_igh_gff3sort.annotation.gtf.gz

#chr12  HAVANA  gene    113418558   113422730   .   -   .   gene_id "ENSMUSG00000076617.9"; gene_type "IG_C_gene"; gene_name "Ighm"; level 2; havana_gene "OTTMUSG00000051485.2";

Ok, good I have Hmgb2 plus strand and Ighm minus strand.


Check the read strand under IGV for these two genes :

Hmgb2

biostars1

Ighm

biostars2

Hmgb2 gets a lot of paired-read F2R1 (blue reads) and vice-versa Ighm gets a lot of paired-read F1R2 (red reads)

Note : As paired-end Illumina sequencing is R1 forward and R2 reverse, I was expecting R1 to lead the forward strand, which is not my case but anyway, it is not the point, R2 is leading the forward strand.


Reading the documentation of STAR, part 7.Counting number of reads per gene.

STAR outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:

  • column 1: gene ID
  • column 2: counts for unstranded RNA-seq
  • column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes)
  • column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse)

Select the output according to the strandedness of your data. Note, that if you have stranded data and choose one of the columns 3 or 4, the other column (4 or 3) will give you the count of antisense reads.

So, what I get is that, either the 3rd column give me my sense count or antisense, and the 4th will give me the opposite strand result.

Fine. But I want to know if STAR take the gene strand into account. I've found this threa thread (with $2 = column2, $3 = column3, $4 = column4)

$2 is for unstranded hits, but those overlapping on opposite strand of features are considered ambiguous. $3 reports hits based on the strand you have given in your gff annotation, and $4 in the reverse direction of your features in gff (for PE-data the 5'3'-direction is also considered). Refer to -s option of htseq-count


So, I was expecting to get high count of Hmgb2 in either column 3 or 4 and high count of Ighm in the other column

Hmgb2 / ENSMUSG00000054717.7

grep "ENSMUSG00000054717.7" file_ReadsPerGene.out.tab

#ENSMUSG00000054717.7   3400    31  3369

Ighm / ENSMUSG00000076617.9

grep "ENSMUSG00000076617.9" file_ReadsPerGene.out.tab

#ENSMUSG00000076617.9   16063   11  16052

All my high counts are in the 4th column... Did I forgot to tweak some options ?

STAR RNA-Seq --quantMode • 3.5k views
ADD COMMENT
2
Entering edit mode
5.1 years ago

Most standard stranded Illumina RNA-Seq (e.g. TruSeq) sequencing protocols sequence the first strand of the cDNA which is generated by reverse transcribing the mRNA. What this means is that most of your "fragments" (i.e. reads) from a given feature are on the reverse strand.

If it is not clear from the above statement. The counts in the "reverse" column are your actual feature counts.

See #6-#7 (the fragment being sequenced matches the DNA molecule generated in #2) in this image:

If you use something like Y-shaped adapters then your reads are generally not anti-sense (e.g. smallRNA kit).

ADD COMMENT
0
Entering edit mode

It does not bother me to have my count in the antisense column but I was expecting my gene counts from 2 genes stranded in different way to be in different column.

Unless, a read is counted in the feature strandness ? Like, one read in reverse in a minus strand gene is counted as forward ?

ADD REPLY
1
Entering edit mode

Since not everyone does paired end, the first read is what is used for strand determination. Your fragment corresponds to the complement of your actual mRNA, so your first read will be anti-sense and your second read will be sense.

Making the call based on R1 is logical because for Illumina single-end sequencing you are just sequencing read #1 - so in all possible cases you have read #1.

ADD REPLY
0
Entering edit mode

I cannot see your pictures please see : How to add images to a Biostars post

ADD REPLY
1
Entering edit mode

Something is off about Biostars and images...I can see the images fine on my phone, but not at my work desktop.

ADD REPLY
0
Entering edit mode

ngs

What about now?

ADD REPLY
0
Entering edit mode

All good now 👍

ADD REPLY
0
Entering edit mode

Still no on my desktop...

ADD REPLY
0
Entering edit mode

Happend to me yesterday

ADD REPLY
2
Entering edit mode
5.1 years ago

Most stranded libraries produced in the past ~7 years should correspond to the counts in the last column, wherein read 2's orientation matches that of the transcript. The description you linked to is wrong. Column 2 is for unstranded libraries, so the strand of a feature on the genome is ignored. Column 3 assumes that the first read in a pair's orientation matches that of the originating fragment (this is rarely the case) and column four the same but for read 2. TruSeq kits use the standard method, so the last column is correct.

ADD COMMENT
1
Entering edit mode

Thanks Devon. If i'm interesting in anti-sense transcript I need the 3rd column then ?

ADD REPLY
2
Entering edit mode

That is exactly the case

ADD REPLY

Login before adding your answer.

Traffic: 1441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6