Question

Error in GOseq analysis

0

Entering edit mode

5.2 years ago

Bioinfonext ▴ 480

I got the differentially expressed genes using DESeq2 analysis but not sure how exactly should I get the genes.vector and length.vector.names? I tried below code but I am getting the error? I will be thankful for your help and time.

> deseq2_result <- read.table("deseq2.res.root.txt",sep="\t",head=T,row.names=1)
> DEG <- rownames(subset(deseq2_result, padj <0.05 & log2FoldChange>0))                                                                                                 > ALL <- rownames(subset(deseq2_result))
> DEG.vector <- c(t(DEG))
> ALL.vector<-c(t(ALL))
> gene.vector=as.integer(ALL.vector%in%DEG.vector)
> names(gene.vector)=ALL.vector
> head(gene.vector)
BGIOSGA001272 BGIOSGA001400 BGIOSGA002386 BGIOSGA002510 BGIOSGA002743
            0             0             0             1             1
BGIOSGA002784
            1
> txdb = makeTxDbFromGFF("Oryza_indica.ASM465v1.44.chr.gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .extract_exons_from_GRanges(exon_IDX, gr, mcols0, tx_IDX, feature = "exon",  :
  98 exons couldn't be linked to a transcript so were dropped (showing
  only the first 6):
  seqid   start     end strand   ID               Name
1     1 1479179 1479368      + <NA> ENSRNA049495102-E1
2     1 3871367 3871431      - <NA> ENSRNA049495083-E1
3     1 7021193 7021391      - <NA> ENSRNA049495099-E1
4     1 7023628 7023818      - <NA> ENSRNA049495093-E1
5     1 7127068 7127257      + <NA> ENSRNA049495097-E1
6     1 7155225 7155422      - <NA> ENSRNA049495091-E1
                         Parent Parent_type
1 transcript:ENSRNA049495102-T1        <NA>
2 transcript:ENSRNA049495083-T1        <NA>
3 transcript:ENSRNA049495099-T1        <NA>
4 transcript:ENSRNA049495093-T1        <NA>
5 transcript:ENSRNA049495097-T1        <NA>
6 transcript:ENSRNA049495091-T1        <NA>

> txsByGene=transcriptsBy(txdb,"gene")
> length.vector.names=median(width(txsByGene))

> head(length.vector.names)
ABCC13 ABCG36   ACO1   ACT1   ACT3   ADH1
  5655   6943   1132   1547   1725   2894

    > pwf <- nullp(gene.vector,bias.data=length.vector.names)
Error in nullp(gene.vector, bias.data = length.vector.names) :
  bias.data vector must have the same length as DEgenes vector!

Many thanks,

RNA-Seq GOseq R • 2.0k views

ADD COMMENT • link 5.2 years ago by Bioinfonext ▴ 480

0

Entering edit mode

Is this a continuation of your previous question?

ADD REPLY • link 5.2 years ago by Ram 45k

0

Entering edit mode

Yes...here I am trying that code with some changes. I wanted to delete the previous post but sure how to delete it?

ADD REPLY • link 5.2 years ago by Bioinfonext ▴ 480

0

Entering edit mode

You can use the moderate option on the post to delete it.

ADD REPLY • link 5.2 years ago by Ram 45k

0

Entering edit mode

Are you sure, that your gene.vector includes every gene from your annotation file? Can you test, which genes are not included?

ADD REPLY • link 5.2 years ago by caggtaagtat ★ 1.9k

1

Entering edit mode

Thanks, DESeq2 output file does not contain all genes as I did the filtering and kept only those rows which contain total read count more than 10.

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Therefore gene.vector only contains 29000 genes instead of 46000 which are present in gff file.

Many thanks,

ADD REPLY • link 5.2 years ago by Bioinfonext ▴ 480

0

Entering edit mode

From goseq tutorial I learn that I can also use count data instead to of length data,

But not sure I should I get count data from dds Object of DESeq2 after filtering?

###
### code chunk number 30: countbias
###################################################
countbias=rowSums(counts)[rowSums(counts)!=0]
length(countbias)
length(gene.vector)




    ##################################################
    ### code chunk number 31: GO.counts
    ###################################################
    pwf.counts=nullp(gene.vector,bias.data=countbias)

ADD REPLY • link 5.2 years ago by Bioinfonext ▴ 480

0

Entering edit mode

I also tried this from count data file?

> countMatrix = read.table("final_file.Deseq.txt",header=T,sep='\t',check.names=F)
> countbias=rowSums(countMatrix)[rowSums(countMatrix)>10]
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
  'x' must be numeric

ADD REPLY • link 5.2 years ago by Bioinfonext ▴ 480