Question: (Closed) extracting some information from cuffdiff output
0
gravatar for A
3.7 years ago by
A3.6k
A3.6k wrote:

hi,

I have this file from cuffdiff

> head(mycounts[,1:14])

     test_id     gene_id      gene         locus sample_1 sample_2 status  value_1  value_2 log2.fold_change.  test_stat p_value  q_value
1 XLOC_000001 XLOC_000001    NAC001   1:3630-5899       C1       C2     OK  8.16533  8.82461         0.1120220  0.2496490 0.79700 0.999241
2 XLOC_000002 XLOC_000002      DCL1 1:23145-33153       C1       C2     OK 12.65950 15.12090         0.2563240  0.0853285 0.73045 0.999241
3 XLOC_000003 XLOC_000003 AT1G01073 1:44676-44787       C1       C2 NOTEST  0.00000  0.00000         0.0000000  0.0000000 1.00000 1.000000
4 XLOC_000004 XLOC_000004     IQD18 1:52091-54692       C1       C2     OK  2.98590  3.46625         0.2152080  0.3954010 0.68640 0.999241
5 XLOC_000005 XLOC_000005 AT1G01115 1:56623-56740       C1       C2 NOTEST  0.00000  0.00000         0.0000000  0.0000000 1.00000 1.000000
6 XLOC_000006 XLOC_000006      GIF2 1:72324-74737       C1       C2     OK 23.01450 21.96440        -0.0673764 -0.0884016 0.93085 0.999241
  significant
1          no
2          no
3          no
4          no
5          no
6          no

How I can extract columns 3 and 10 if only the column 14(significant) is yes and only between sample C1 and C2 because I have another samples in lower rows??

Thank you

sequencing rna-seq myposts R gene • 1.4k views
ADD COMMENTlink modified 13 months ago by RamRS24k • written 3.7 years ago by A3.6k
1

First extracting and separating down and up regulated gene

awk '{if($10 > 0 && index($10,"+inf")==0){print $0 > "Up_Regulated.txt"}else if($10<0 && index($10,"-inf")==0){print $0 > "Down_Regulated.txt"}}' gene_exp.diff

then extracting the significant one (yes in column 14)

awk '$14=="yes"' diff_out/gene_exp.diff > diff_genes.txt

But I donno how to combine these

ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by A3.6k

Simply add $14=="yes" in the first awk command

awk '{if($10 > 0 && index($10,"+inf")==0 && $14=="yes"){print $0 > "Up_Regulated.txt"}else if($10<0 && index($10,"-inf")==0 && $14=="yes"){print $0 > "Down_Regulated.txt"}}' gene_exp.diff
ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by Nicolas Rosewick8.3k
3

You've got to learn how to Google. For example, googling "extract subset of data frame R" will get you leads on

  1. how to filter a subset of rows and
  2. how to project a subset of columns.

Unless you invest time on understanding what you're trying to do, just running commands given by people here will not help you in the long run. I say this because you've been here a long time and your questions follow a pattern of asking for commands and interactively debugging minor errors in them with folks on the forum.

ADD REPLYlink modified 13 months ago • written 3.7 years ago by RamRS24k
mycounts2 <- subset(x = mycounts, subset = mycounts$sample_1 == "C1" & mycounts$sample_2 == "C2")
results <- if (mycounts2$significant == "yes" ) {
        results <- subset( x = mycounts2, select = mycounts2$significant == "yes", select = mycounts2[,3:10])
}

Something like this might work. I've only started learning R a few days ago though. You might not even need the if statement to be honest. You probably don't but I don't have a dataset to test it.

ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by cbio430

Thank you

> mycounts2 <- subset(x = mycounts, subset = mycounts$sample_1 == "C1" & mycounts$sample_2 == "C2")
> 
> results <- if (mycounts2$significant == "yes" ) {
+     
+     results <- subset( x = mycounts2, select = mycounts2[,3:10])
+     
+ }
Warning message:
In if (mycounts2$significant == "yes") { :
  the condition has length > 1 and only the first element will be used

but nothing happened and the output is the same with input

ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by A3.6k
1

Yea I thought about this just now and edited my previous comment. Give this a try instead of the if statement:

results <- subset( x = mycounts2, subset = mycounts2$significant == "yes", select = mycounts2[,3:10])

If you only want columns 3 AND 10 and not 3 THROUGH 10 then you'd need to do this I think:

results <- subset( x = mycounts2, subset = mycounts2$significant == "yes", select = mycounts2[,c(3,10)])
ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by cbio430

Thank you, but

> results <- subset( x = mycounts2, select = mycounts2$significant == "yes", select = mycounts2[,3:10])
Error in subset.data.frame(x = mycounts2, select = mycounts2$significant ==  : 
  formal argument "select" matched by multiple actual arguments
ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by A3.6k
1

Wow, I'm so sorry. I didn't realize I had totally mistyped the variable for the second parameter. I updated my previous comment to the "correct" code. Try it now.

ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by cbio430

Thank you

> results <- subset( x = mycounts2, subset = mycounts2$significant == "yes", select = mycounts2[,3:10])
Error in x[j] : invalid subscript type 'list'
ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by A3.6k
2

What is the output of class(mycounts) and class(mycounts2)? I was assuming it was a data.frame.

The commands I gave you should have worked, I tested it on my own dataset and it was fine.

ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by cbio430

Thank you

> class(mycounts)
[1] "data.frame"
> class(mycounts2)
[1] "data.frame"
ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by A3.6k
1

That's odd. Alright I'll take one more crack at it.

install.package('data.table')
library('data.table')

mycounts <- fread("/path/to/mycounts", header=TRUE)

mysamples <- subset(x = mycounts, subset = mycounts$sample_1 == "C1" & mycounts$sample_2 == "C2")

results <- subset(x = mysamples, subset = mysamples$significant == "yes")

final <- results[,3:10]
ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by cbio430

Hello Fereshteh!

We believe that this post does not fit the main topic of this site.

Please see my comment here: C: extracting some information from cuffdiff output

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 13 months ago • written 3.7 years ago by RamRS24k
2
gravatar for Nicolas Rosewick
3.7 years ago by
Belgium, Brussels
Nicolas Rosewick8.3k wrote:

In R :

mycounts.filt <- mycounts[mycounts$significant=="yes" & mycounts$sample_1=="C1" & mycounts$sample_2=="C2",c(3,10)]
ADD COMMENTlink modified 13 months ago by RamRS24k • written 3.7 years ago by Nicolas Rosewick8.3k

thank you but how i can have two files one contains which column 10> 0 and another column 10 <0 ???

ADD REPLYlink written 3.7 years ago by A3.6k
2
mycounts.filt.over10 <- mycounts[mycounts$significant=="yes" & mycounts$sample_1=="C1" & mycounts$sample_2=="C2" & mycounts[,10]>0,c(3,10)]

mycounts.filt.under10 <- mycounts[mycounts$significant=="yes" & mycounts$sample_1=="C1" & mycounts$sample_2=="C2" & mycounts[,10]<0,c(3,10)]

I really encourage you to better learn R as it's a really basic feature of R (subsetting a data.frame). Also it's not really a bioinformatics question (more a stackoverflow related question) ;)

ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by Nicolas Rosewick8.3k

thank you but in  stackoverflow  I don't have needed scores then they don't allow me to ask any question :)

ADD REPLYlink written 3.7 years ago by A3.6k

Search stackoverflow for existing questions. Most of your questions are basic enough that a google search will give you straightforward results.

We appreciate your drive to keep learning, but you'll have to improve and work on putting in a bit more effort before asking here. Unlike stackoverflow, we do not have automated reputation based limitations, but that does not mean that you can invest near-zero effort and expect people to provide you with exact commands.

Please avoid "read the manual" or "google it" style questions in the future.

ADD REPLYlink written 3.7 years ago by RamRS24k

I'm so sorry...

ADD REPLYlink written 3.7 years ago by A3.6k

Didn't know you could combine multiple & statements like this, it's good to know. Thanks!

ADD REPLYlink written 3.7 years ago by cbio430

The double && is a single-value "AND". The single & operator is a vectorized boolean AND. So it works like "+" in regards to using multiple in sequence.

So you can

c(T,F,T) & c(F,F,T) & c(T,T,T)

and get c(F,F,T), which would be a well functioning vector index.

ADD REPLYlink modified 13 months ago by RamRS24k • written 3.7 years ago by karl.stamm3.5k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour