Question: How to equalize two vector lengths?
2
gravatar for Parham
4.5 years ago by
Parham1.4k
Sweden
Parham1.4k wrote:

Hello,

I am doing goseq and I have two vectors for a code line. de.genes vector is about 250 and lengthData is about 7000. I have to make lenghtData to match up with de.genes I guess, as far as I understood from the error below. But since I am not expert on codes and stuff I cannot figure out how to do it. Can someone help me with that? Very appreciated!

> gene_pwf = nullp(de.genes, bias.data=lengthData)
Error in nullp(de.genes, bias.data = lengthData) : 
  bias.data vector must have the same length as DEgenes vector! 
go vector length • 2.5k views
ADD COMMENTlink modified 4.5 years ago by Devon Ryan89k • written 4.5 years ago by Parham1.4k
4
gravatar for Devon Ryan
4.5 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

Presuming that de.genes is a subset of the original length 7000 genes vector, then just subset lengthData in the exact same way as you did genes. We can't know exactly how you did that, you didn't show us.

ADD COMMENTlink written 4.5 years ago by Devon Ryan89k

You are right Devon, I had to be more specific. So I prepared de.genes the same way as you showed me C: Codes for preparing data for goseq!  from deseq2res output. Then for lenghtData I did as follow as goseq workflow suggests:

txdb <- makeTranscriptDbFromBiomart(biomart="fungal_mart", dataset="spombe_eg_gene", host="fungi.ensembl.org")
txsByGene=transcriptsBy(txdb, "gene")
lengthData=median(width(txsByGene))

May be I am wrong interpreting the source of the problem. If you need any other information please write.

ADD REPLYlink written 4.5 years ago by Parham1.4k
1

Sorry, I missed the reply. Something along the lines of de.lengths <- lengthData[which(d$padj<0.05)] should solve the problem. Just use de.lengths then.

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

No worries! This worked but the data types of these two objects are not the same and I get an error for that "Error in sum(y[ww][1:size]) : invalid 'type' (character) of argument". Can one be converted to another?

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Parham1.4k
1

According to help(nullp), de.genes should be " A named binary vector where 1 represents DE, 0 not DE and the names are gene IDs." I recall that being different in a previous version of goseq, though perhaps I'm misremembering. So, something like:

d <- read.csv("deseq2res.csv", header=T, row.names=1)
deGenes <- c(rep(0, nrow(d))
deGenes[which(d$padj<0.05)] <- 1
row.names(deGenes) <- row.names(d)

Then just use lengthData and deGenes as is (as long as they have the same order).

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

Thanks Devon, this looks like it will work unless there is minor thing in the last line. It gives an error that I don't know what to do with. Also I have a question. What's the difference between second line your wrote comparing to degenes <- rep(0, nrow(d))) ? I created both and it seems they both contain the same data! Thanks again for your help.

> row.names(deGenes) <- row.names(deseqres)
Error in `rownames<-`(x, value) : 
  attempt to set 'rownames' on an object with no dimensions
ADD REPLYlink written 4.5 years ago by Parham1.4k
1

Try instead names(deGenes) <- row.names(deseqres)

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

The problem I had in the beginning is back! de.lengths <- lengthData[which(deseqres$padj<0.05)] length is 237 and the deGenes is 6089! I guess we should tell de.lengths to filter out from deGenes. Is it correct? 

 

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Parham1.4k
1

Please reread my comment from 13 hours ago. Apparently in the most recent versions of goseq one doesn't subset things.
 

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

Right, now I understand! Sorry asking somethings twice. I am learning and it is not easy for me to think of all aspects at once.

However the lengthData that I create from txdb holds whole full genes list with length of 7019, but the deGenes which is created from deseqres holds 6089  since deseq removes the rows that have a sum of zero during calculations! So I have to remove those rows that are not present in lengthData to make them same length. Is it right what I think? 

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Parham1.4k
1

Correct, you'll to use %in% to see which of the rows of txsByGene are in deseqres.

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

Can I just remove the rows in lengthData that are not present in deGenes? If you could show how? 

ADD REPLYlink written 4.5 years ago by Parham1.4k
1

Whether you subset lengthData or txyByGene is up to you. You'll need to use %in% either way. You should be able to figure out how to do this yourself.

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

Ok, it took a long time until I could come up with something that might do the job. However I would like to check with you if it is correct, if you could have a glance. So first I make a vector with all the genes present in both lengthData and deseqres then I subset lengthData into a new_lengthData with them. I even can't express myself very well. But here is how I did:

> select_genes <- as.vector(names(lengthData)%in%row.names(deseqres))
> new_lengthData <- lengthData[select_genes]
ADD REPLYlink written 4.5 years ago by Parham1.4k
1

Looks correct.

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

Devon did you see my reply here? I appreciate if you can give some help in here. 

Thanks!

ADD REPLYlink written 4.5 years ago by Parham1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1187 users visited in the last hour