Question: So many Zeros (read counting)_NGS sequence analysis with R / Bioconductor: RNA-Seq workflow
0
gravatar for healing80
17 months ago by
healing800
healing800 wrote:

Hello. I am following 'NGS sequence analysis with R / Bioconductor: RNA-Seq workflow', and learning sequence alignment.

When I checked 'alignment summary'

> read_statsDF <-alignStats(args)
> write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")
> read.table(system.file("extdata", "alignStats.xls", package="systemPipeR"), header=TRUE)
   FileName Nreads2x Nalign Perc_Aligned Nalign_Primary Perc_Aligned_Primary
1       M1A   192918 177961     92.24697         177961             92.24697
2       M1B   197484 159378     80.70426         159378             80.70426
3       A1A   189870 176055     92.72397         176055             92.72397
4       A1B   188854 147768     78.24457         147768             78.24457
5       V1A   198732 160115     80.56830         160115             80.56830
6       V1B   195542 184418     94.31120         184418             94.31120
7       M6A   197234 182469     92.51397         182469             92.51397
8       M6B   180904 162636     89.90183         162636             89.90183
9       A6A   183140 170258     92.96604         170258             92.96604
10      A6B   196578 179932     91.53211         179932             91.53211
11      V6A   194204 176857     91.06764         176857             91.06764
12      V6B   198414 176825     89.11922         176825             89.11922
13     M12A   199434 179722     90.11603         179722             90.11603
14     M12B   188954 162894     86.20828         162894             86.20828
15     A12A   187922 171326     91.16868         171326             91.16868
16     A12B   199790 173771     86.97683         173771             86.97683
17     V12A   193922 174180     89.81962         174180             89.81962
18     V12B   181006 157790     87.17391         157790             87.17391
> library(GenomicFeatures)

and read counting.

> txdb <-loadDb("./data/tair10.sqlite")
> eByg <-exonsBy(txdb, by="gene")
> eByg[1:2]
GRangesList object of length 2:
$AT1G01010 
GRanges object with 6 ranges and 2 metadata columns:
      seqnames       ranges strand |   exon_id   exon_name
         <Rle>    <IRanges>  <Rle> | <integer> <character>
  [1]     Chr1 [3631, 3913]      + |         1        <NA>
  [2]     Chr1 [3996, 4276]      + |         2        <NA>
  [3]     Chr1 [4486, 4605]      + |         3        <NA>
  [4]     Chr1 [4706, 5095]      + |         4        <NA>
  [5]     Chr1 [5174, 5326]      + |         5        <NA>
  [6]     Chr1 [5439, 5899]      + |         6        <NA>

$AT1G01020 
GRanges object with 12 ranges and 2 metadata columns:
       seqnames       ranges strand | exon_id exon_name
   [1]     Chr1 [5928, 6263]      - |      40      <NA>
   [2]     Chr1 [6437, 7069]      - |      41      <NA>
   [3]     Chr1 [6790, 7069]      - |      42      <NA>
   [4]     Chr1 [7157, 7232]      - |      43      <NA>
   [5]     Chr1 [7157, 7450]      - |      44      <NA>
   ...      ...          ...    ... .     ...       ...
   [8]     Chr1 [7762, 7835]      - |      47      <NA>
   [9]     Chr1 [7942, 7987]      - |      48      <NA>
  [10]     Chr1 [8236, 8325]      - |      49      <NA>
  [11]     Chr1 [8417, 8464]      - |      50      <NA>
  [12]     Chr1 [8571, 8737]      - |      51      <NA>

-------
seqinfo: 7 sequences (2 circular) from an unspecified genome; no seqlengths

and... wrote the table...got so many zeros. Could anybody give me some advice? what mistakes I made? I am a beginner in bioinformatics, so please give me easy advice. Thank you very much in advance. :)

> txdb <- loadDb("./data/tair10.sqlite")
> eByg <-exonsBy(txdb, by=c("gene"))
> bfl <-BamFileList(outpaths(args), yieldSize=50000, index=character())
> counteByg <-summarizeOverlaps(eByg, bfl, mode="IntersectionStrict", ignore.strand=TRUE, inter.feature=FALSE, singleEnd=FALSE)
> countDFeByg <-assays(counteByg)$counts
> countDFeByg[1:4, 1:18]
          M1A M1B A1A A1B V1A V1B M6A M6B A6A A6B V6A V6B M12A M12B A12A A12B V12A V12B
AT1G01010   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0
AT1G01020   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0
AT1G01030   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0
AT1G01040   0   0   0   0   0   0   0   0   0   0   0   0    0    0    0    0    0    0
rna-seq alignment tutorial R • 518 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by healing800
4

Do you have many zeroes or only zeroes? In most tissues only 1/3 of all genes are expressed so most genes will have a zero. You can filter them out before doing statistical test with limma or edgeR, etc.

ADD REPLYlink written 17 months ago by Benn7.9k
2

If you have only zeros then your chromosome names are likely not matching in your BAM files and the annotation. Though if you are just following the tutorial then it should not be happening.

ADD REPLYlink written 17 months ago by genomax75k
1

Summarize your table with something like:

table( rowSums (countDFeByg) == 0 )

Then we will now how many zeroes are "so many zeroes".

ADD REPLYlink written 17 months ago by h.mon28k

Thanks a lot :)

ADD REPLYlink written 17 months ago by healing800
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 929 users visited in the last hour