Question

A large amount of genes with 0 reads in bacterial RNA seq

0

Entering edit mode

7.2 years ago

xioli2013 ▴ 10

Hi Community,

I am new to RNA seq analysis. I have a RNA seq data set from Deinococcus radiodurans. We came across a large of 0 reads in the 12 libraries.

for the whole data set, 12 libraries having 0 reads account for about 26% of the data (3224 genes)

mean(rowSums(full == 0) == 12) [1] 0.2583747

Extracted those low counts( 1060 genes) and 12 libraries having 0 reads account for about 79% of the data

mean(rowSums(lowRead == 0) == 12) [1] 0.7858491

What does this tell us? And how to extract more information from this data?

Thank you for your help.

Xp

RNA-Seq • 1.0k views

ADD COMMENT • link updated 7.2 years ago by Michael 54k • written 7.2 years ago by xioli2013 ▴ 10

2

Entering edit mode

It might tell your data is bad, but we need more details. This is a bacterium, which protocols did you use? Did you for example tell the sequencing facility that you have bacterial RNA? Did they use ribominus? Did you run fastQC? Sequence duplication rate? How did you do the alignment? Did you remove adapters? ....

ADD REPLY • link 7.2 years ago by Michael 54k

0

Entering edit mode

Hi Michael:

The libraries for D.radiodurans was not made by me, but I will gather information on that. The data for S.oneidensis made by the same person was decent.

I checked the quality of all 12 libraries for D.rad, the basic stats for them look like this:

Filename s_8_sequence_TGACCA-without_phix-without_other.fq File type Conventional base calls Encoding Sanger / Illumina 1.9 Total Sequences 20291165 Sequences flagged as poor quality 0 Sequence length 50 %GC 61

No flags of poor quality for all of them, but I do get "red" on sequence duplication level, overrepresented sequences and Kmer content.

Would it be causing the problem of 0 reads for 1/4 of the 12 libraries?

Many thanks for your input.

Xp

ADD REPLY • link 7.2 years ago by xioli2013 ▴ 10