Question

Strand specific library

0

Entering edit mode

2.7 years ago

esimonova.me ▴ 20

I first analysed the data without taking into consideration the strand-specificity of my library, afterwards I found out that the library was stranded and I reanalysed the data. The difference in counts were very significant between two types of analyses. With no knowledge of strandness ( I got around 1000-2000 counts per gene I am interested in), however specifying strandness in hisat2 and htseq I got only 20-30 counts per some genes that I om interested in. The sequencing aimed at coverage 30 M. I just wanted to ensure that getting this difference in counts number is fine.

htseq • 1.2k views

ADD COMMENT • link updated 2.7 years ago by benformatics 3.9k • written 2.7 years ago by esimonova.me ▴ 20

score 1 · Answer 1 · 2021-07-21

1

Entering edit mode

2.7 years ago

benformatics 3.9k

Looking at protein-coding genes.

If the dataset is unstranded and you do antisense gene counts and sense gene counts your values should be about 50% antisense 50% sense.

If the dataset is stranded (assuming RNA-seq) and you do antisense gene counts and sense gene counts your values should >90% antisense and 10% sense. I usually observe an order of magnitude difference in read counts per stranded when investigating a stranded library.

ADD COMMENT • link 2.7 years ago by benformatics 3.9k

0

Entering edit mode

Also you can view the BAM files in IGV to verify strandedness

ADD REPLY • link 2.7 years ago by benformatics 3.9k

0

Entering edit mode

Thanks for the answer! I am sort of new to bioinformatics can you please confirm it by some paper. I may understand why it should 50/50 for unstranded library but the proportion 90/10 for stranded RNA-seq library seems unexplainable to me so far.

ADD REPLY • link 2.7 years ago by esimonova.me ▴ 20

score 1 · Answer 2 · 2021-07-21

1

Entering edit mode

2.7 years ago

swbarnes2 14k

With no knowledge of strandness ( I got around 1000-2000 counts per gene I am interested in), however specifying strandness in hisat2 and htseq I got only 20-30 counts per some genes that I om interested in.

That suggests to me that you put in the wrong strandedness. Put it in the other direction, and you should get your thousand counts back.

ADD COMMENT • link 2.7 years ago by swbarnes2 14k

0

Entering edit mode

After checking the strandness with RSeQC I got the following stats:

This is PairEnd Data
Fraction of reads failed to determine: 0.0569
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0170
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9261

Can I conclude based on the stats that it is reverse stranded library?

According to this I think it is reverse stranded: https://chipster.csc.fi/manual/library-type-summary.html

ADD REPLY • link 2.7 years ago by esimonova.me ▴ 20

1

Entering edit mode

Yes if it RNA-seq it is usually reverse stranded. To check this, if you just change your strand in your histat2 command and then re-run hisat2 and htseq your 20-30 counts should jump to 900-1800+.

ADD REPLY • link 2.7 years ago by benformatics 3.9k