Entering edit mode
                    5.2 years ago
        Bioinfonext
        
    
        ▴
    
    480
    I got the differentially expressed genes using DESeq2 analysis but not sure how exactly should I get the genes.vector and length.vector.names? I tried below code but I am getting the error? I will be thankful for your help and time.
> deseq2_result <- read.table("deseq2.res.root.txt",sep="\t",head=T,row.names=1)
> DEG <- rownames(subset(deseq2_result, padj <0.05 & log2FoldChange>0))                                                                                                 > ALL <- rownames(subset(deseq2_result))
> DEG.vector <- c(t(DEG))
> ALL.vector<-c(t(ALL))
> gene.vector=as.integer(ALL.vector%in%DEG.vector)
> names(gene.vector)=ALL.vector
> head(gene.vector)
BGIOSGA001272 BGIOSGA001400 BGIOSGA002386 BGIOSGA002510 BGIOSGA002743
            0             0             0             1             1
BGIOSGA002784
            1
> txdb = makeTxDbFromGFF("Oryza_indica.ASM465v1.44.chr.gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .extract_exons_from_GRanges(exon_IDX, gr, mcols0, tx_IDX, feature = "exon",  :
  98 exons couldn't be linked to a transcript so were dropped (showing
  only the first 6):
  seqid   start     end strand   ID               Name
1     1 1479179 1479368      + <NA> ENSRNA049495102-E1
2     1 3871367 3871431      - <NA> ENSRNA049495083-E1
3     1 7021193 7021391      - <NA> ENSRNA049495099-E1
4     1 7023628 7023818      - <NA> ENSRNA049495093-E1
5     1 7127068 7127257      + <NA> ENSRNA049495097-E1
6     1 7155225 7155422      - <NA> ENSRNA049495091-E1
                         Parent Parent_type
1 transcript:ENSRNA049495102-T1        <NA>
2 transcript:ENSRNA049495083-T1        <NA>
3 transcript:ENSRNA049495099-T1        <NA>
4 transcript:ENSRNA049495093-T1        <NA>
5 transcript:ENSRNA049495097-T1        <NA>
6 transcript:ENSRNA049495091-T1        <NA>
> txsByGene=transcriptsBy(txdb,"gene")
> length.vector.names=median(width(txsByGene))
> head(length.vector.names)
ABCC13 ABCG36   ACO1   ACT1   ACT3   ADH1
  5655   6943   1132   1547   1725   2894
    > pwf <- nullp(gene.vector,bias.data=length.vector.names)
Error in nullp(gene.vector, bias.data = length.vector.names) :
  bias.data vector must have the same length as DEgenes vector!
Many thanks,
Is this a continuation of your previous question?
Yes...here I am trying that code with some changes. I wanted to delete the previous post but sure how to delete it?
You can use the
moderateoption on the post to delete it.Are you sure, that your
gene.vectorincludes every gene from your annotation file? Can you test, which genes are not included?Thanks, DESeq2 output file does not contain all genes as I did the filtering and kept only those rows which contain total read count more than 10.
Therefore gene.vector only contains 29000 genes instead of 46000 which are present in gff file.
Many thanks,
From goseq tutorial I learn that I can also use count data instead to of length data,
But not sure I should I get count data from dds Object of DESeq2 after filtering?
I also tried this from count data file?