DESeq2
0
0
Entering edit mode
2.5 years ago
rheab1230 ▴ 140

Hello everyone,

I am trying to perform deseq2 analysis on my genecount file to normalize it.

This is from where I got the gene count file: https://www.ebi.ac.uk/arrayexpress/files/E-GEUV-1/GD660.GeneQuantCount.txt.gz

My gene count file looks like this:

TargetID        Gene_Symbol     Chr     Coord   HG00096.1.M_111124_6 HG00097.7.M_120219_2    HG00099.1.M_120209_6    HG00099.5.M_120131_3    HG00100.2.M_111215_8    HG00101.1.M_111124_4    HG00102.3.M_120202_8    HG00103.4.M_120208_3    HG00104.1.M_111124_5    HG00105.1.M_120209_7    HG00105.3.M_120223_6    HG00106.4.M_120208_5    HG00108.7.M_120219_2    HG00109.1.M_120209_4    HG00109.3.M_120202_5    HG00110.2.M_120131_2    HG00111.1.M_120209_8    HG00111.2.M_111215_4    HG00112.6.M_120119_2    HG00114.1.M_120209_3    HG00114.6.M_120217_1    HG00115.6.M_120119_1    HG00116.2.M_120131_1    HG00117.1.M_111124_2    HG00117.1.M_120209_1    HG00117.2.M_111216_4    HG00117.3.M_120202_6    HG00117.4.M_120208_4    HG00117.5.M_120131_3    HG00117.6.M_120217_1    HG00117.7.M_120219_4    HG00118.4.M_120208_5    HG00119.1.M_120209_3    HG00119.2.M_111216_6    HG00120.3.M_120202_2    HG00121.1.M_111124_7    HG00122.6.M_120119_1    HG00123.4.M_120208_7    HG00124.3.M_120223_7

The code is:

GD_dat = read.delim("GD660.GeneQuantCount.txt",header=TRUE,row.names = NULL)
GD_dat = GD_dat[,-c(1:3)]
head(GD_dat)
dim(GD_dat)
colnames(GD_dat)  = substr(colnames(GD_dat),1,7)
rownames(GD_dat) = substr(rownames(GD_dat),1,15)
geneNames<-GD_dat[,1]
rownames(GD_dat)<-geneNames
GD_dat<-GD_dat[,2:ncol(GD_dat)]
sample_info<-DataFrame(condition=names(GD_dat), row.names=names(GD_dat))
library("DESeq2")
# runs the DESeq2
ds<-DESeqDataSetFromMatrix(countData=GD_dat, colData=sample_info, design= ~condition)
keep_genes<-rowSums(counts(ds))>0

I am getting this error:

  NA20814.2.M_111215_6 NA20815.5.M_120131_5 NA20816.3.M_120202_7
1                    0                    0                    0
2                    0                    0                    0
3                    0                    0                    0
4                   10                    8                   16
5                    0                    0                    0
6                 4860                 6782                 4952
  NA20819.3.M_120202_2 NA20826.1.M_111124_1 NA20828.2.M_111216_8
1                    0                    2                0.000
2                    0                    0                0.000
3                    0                    0                0.000
4                    6                   16                8.000
5                    0                    0                0.000
6                 1864                 3446             4814.479
[1] 53934   661
Error in `.rowNamesDF<-`(x, value = value) :
  duplicate 'row.names' are not allowed
Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique values when setting 'row.names': '333174', '568198', '668559', '1976363', '2182439', '2637270', '2637585', '2795614', '3417146', '5115909', '7291199', '7307416', '7440175', '9212383', '9215731', '10490159', '10697357', '12203078', '12267546', '12794843', '15130775', '15489611', '16739015', '17046652', '18118499', '18507325', '18967449', '19015949', '19303400', '19612838', '19627036', '20408712', '20829598', '21180973', '22788423', '24682679', '25042238', '27401462', '27932953', '29952206', '30501206', '30893010', '31799523', '31895475', '32635667', '32806599', '34117481', '34252878', '34880704', '36871979', '37126773', '37823505', '37962056', '37979892', '38023636', '38080696', '38858438', '39240459', '39347289', '39817308', '40509629', '41754280', '42120283', '42640301', '43009842', '44245583', '45911744', '46854048', '47012325', '50101948', '50155854', '50747584', '50837249', '52009066', '53063128', '53704282', '53835525', '54379303', '54385522', '54427734', '56109820', '5 [... truncated]
Execution halted

In my case I don't know how to arrange the gene in one column and sample in another with their count values. For me its coming as one sample and its corresponding genes.

RNA-seq DESEq2 • 1.0k views
ADD COMMENT
0
Entering edit mode

Hi,

I think you have made your rows and geneNames dataframe from the 'coord' values in the data rather than the Gene_Symbol column.

ADD REPLY

Login before adding your answer.

Traffic: 3003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6