Strand specific library
2
0
Entering edit mode
20 months ago
esimonova.me ▴ 20

I first analysed the data without taking into consideration the strand-specificity of my library, afterwards I found out that the library was stranded and I reanalysed the data. The difference in counts were very significant between two types of analyses. With no knowledge of strandness ( I got around 1000-2000 counts per gene I am interested in), however specifying strandness in hisat2 and htseq I got only 20-30 counts per some genes that I om interested in. The sequencing aimed at coverage 30 M. I just wanted to ensure that getting this difference in counts number is fine.

htseq • 867 views
1
Entering edit mode
20 months ago

Looking at protein-coding genes.

If the dataset is unstranded and you do antisense gene counts and sense gene counts your values should be about 50% antisense 50% sense.

If the dataset is stranded (assuming RNA-seq) and you do antisense gene counts and sense gene counts your values should >90% antisense and 10% sense. I usually observe an order of magnitude difference in read counts per stranded when investigating a stranded library.

0
Entering edit mode

Also you can view the BAM files in IGV to verify strandedness

0
Entering edit mode

Thanks for the answer! I am sort of new to bioinformatics can you please confirm it by some paper. I may understand why it should 50/50 for unstranded library but the proportion 90/10 for stranded RNA-seq library seems unexplainable to me so far.

1
Entering edit mode
20 months ago

With no knowledge of strandness ( I got around 1000-2000 counts per gene I am interested in), however specifying strandness in hisat2 and htseq I got only 20-30 counts per some genes that I om interested in.

That suggests to me that you put in the wrong strandedness. Put it in the other direction, and you should get your thousand counts back.

0
Entering edit mode

After checking the strandness with RSeQC I got the following stats:

This is PairEnd Data
Fraction of reads failed to determine: 0.0569
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0170
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9261


Can I conclude based on the stats that it is reverse stranded library?

According to this I think it is reverse stranded: https://chipster.csc.fi/manual/library-type-summary.html

1
Entering edit mode

Yes if it RNA-seq it is usually reverse stranded. To check this, if you just change your strand in your histat2 command and then re-run hisat2 and htseq your 20-30 counts should jump to 900-1800+.