Question

So many Zeros (read counting)_NGS sequence analysis with R / Bioconductor: RNA-Seq workflow

0

Entering edit mode

5.8 years ago

healing80 • 0

Hello

I am following 'NGS sequence analysis with R / Bioconductor: RNA-Seq workflow', and learning sequence alignment.

When I checked 'alignment summary'

read_statsDF <-alignStats(args)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")
read.table(system.file("extdata", "alignStats.xls", package="systemPipeR"), header=TRUE)
   FileName Nreads2x Nalign Perc_Aligned Nalign_Primary Perc_Aligned_Primary
1       M1A   192918 177961     92.24697         177961             92.24697
2       M1B   197484 159378     80.70426         159378             80.70426
3       A1A   189870 176055     92.72397         176055             92.72397
4       A1B   188854 147768     78.24457         147768             78.24457
5       V1A   198732 160115     80.56830         160115             80.56830
6       V1B   195542 184418     94.31120         184418             94.31120
7       M6A   197234 182469     92.51397         182469             92.51397
8       M6B   180904 162636     89.90183         162636             89.90183
9       A6A   183140 170258     92.96604         170258             92.96604
10      A6B   196578 179932     91.53211         179932             91.53211
11      V6A   194204 176857     91.06764         176857             91.06764
12      V6B   198414 176825     89.11922         176825             89.11922
13     M12A   199434 179722     90.11603         179722             90.11603
14     M12B   188954 162894     86.20828         162894             86.20828
15     A12A   187922 171326     91.16868         171326             91.16868
16     A12B   199790 173771     86.97683         173771             86.97683
17     V12A   193922 174180     89.81962         174180             89.81962
18     V12B   181006 157790     87.17391         157790             87.17391
library(GenomicFeatures)

and read counting.

txdb <-loadDb("./data/tair10.sqlite")
eByg <-exonsBy(txdb, by="gene")
eByg[1:2]
GRangesList object of length 2:
$AT1G01010 
GRanges object with 6 ranges and 2 metadata columns:
      seqnames       ranges strand |   exon_id   exon_name
         <Rle>    <IRanges>  <Rle> | <integer> <character>
  [1]     Chr1 [3631, 3913]      + |         1        <NA>
  [2]     Chr1 [3996, 4276]      + |         2        <NA>
  [3]     Chr1 [4486, 4605]      + |         3        <NA>
  [4]     Chr1 [4706, 5095]      + |         4        <NA>
  [5]     Chr1 [5174, 5326]      + |         5        <NA>
  [6]     Chr1 [5439, 5899]      + |         6        <NA>

$AT1G01020 
GRanges object with 12 ranges and 2 metadata columns:
       seqnames       ranges strand | exon_id exon_name
   [1]     Chr1 [5928, 6263]      - |      40      <NA>
   [2]     Chr1 [6437, 7069]      - |      41      <NA>
   [3]     Chr1 [6790, 7069]      - |      42      <NA>
   [4]     Chr1 [7157, 7232]      - |      43      <NA>
   [5]     Chr1 [7157, 7450]      - |      44      <NA>
   ...      ...          ...    ... .     ...       ...
   [8]     Chr1 [7762, 7835]      - |      47      <NA>
   [9]     Chr1 [7942, 7987]      - |      48      <NA>
  [10]     Chr1 [8236, 8325]      - |      49      <NA>
  [11]     Chr1 [8417, 8464]      - |      50      <NA>
  [12]     Chr1 [8571, 8737]      - |      51      <NA>

-------
seqinfo: 7 sequences (2 circular) from an unspecified genome; no seqlengths

and... wrote the table...got so many zeros. Could anybody give me some advice? what mistakes I made?

I am a beginner in bioinformatics, so please give me easy advice. Thank you very much in advance. :)

txdb <- loadDb("./data/tair10.sqlite")
eByg <-exonsBy(txdb, by=c("gene"))
bfl <-BamFileList(outpaths(args), yieldSize=50000, index=character())
counteByg <-summarizeOverlaps(eByg, bfl, mode="IntersectionStrict", ignore.strand=TRUE, inter.feature=FALSE, singleEnd=FALSE)
countDFeByg <-assays(counteByg)$counts
countDFeByg[1:4, 1:18]
          M1A M1B A1A A1B V1A V1B M6A M6B A6A A6B V6A V6B M12A M12B A12A A12B V12A V12B
AT1G01010   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0
AT1G01020   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0
AT1G01030   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0
AT1G01040   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0

alignment RNA-Seq R • 1.5k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 5.8 years ago by healing80 • 0

4

Entering edit mode

Do you have many zeroes or only zeroes? In most tissues only 1/3 of all genes are expressed so most genes will have a zero. You can filter them out before doing statistical test with limma or edgeR, etc.

ADD REPLY • link 5.8 years ago by Benn 8.3k

2

Entering edit mode

If you have only zeros then your chromosome names are likely not matching in your BAM files and the annotation. Though if you are just following the tutorial then it should not be happening.

ADD REPLY • link 5.8 years ago by GenoMax 141k

1

Entering edit mode

Summarize your table with something like:

table( rowSums (countDFeByg) == 0 )

Then we will now how many zeroes are "so many zeroes".

ADD REPLY • link 5.8 years ago by h.mon 35k

0

Entering edit mode

Thanks a lot :)

ADD REPLY • link 5.8 years ago by healing80 • 0