Write Diffgenes/Diffgeneids In A Tab File
1
0
Entering edit mode
10.8 years ago
GPR ▴ 390

Hello, I am relatively new to CummeRbund. Can somebody tell me how to write a tab file containing diffGenes/diffGeneIDs? Thanks G.

cummerbund • 3.1k views
ADD COMMENT
0
Entering edit mode

Sukhdeep..

Thanks for a wonderful script. It worked for me but the output file include non-significant genes.. What am I doing wrong.. I want the sig genes only.

Please include in your answer how to replace the XLOC thingy with the real gene id. Thanks.

Fahim

ADD REPLY
0
Entering edit mode

Hi Fahim,

Either you should as a new question or add a new comment, don't put these as answers unless what you are writing is a real answer.

You have to use -g with an appropriate GTF file to be used to with cufflinks to get gene id's.

XLOC are the cufflinks locus id's which are mapped to the locus information in the provided gtf file to fetch the geneids

http://seqanswers.com/forums/showthread.php?t=19079

Cheers

ADD REPLY
1
Entering edit mode
10.8 years ago

Using CummeRbund:

diff_genes=subset(diffData(genes(diff_data)),(significant=='yes'))

where diffdata is the initial diffout folder generated after running cuffdiff and read in R using readCufflinks

Now, write out the diff_genes(list of significant DE genes)

write.table(diff_genes,'diff_genes.txt',sep='\t',quote=FALSE,row.names=FALSE,col.names=TRUE)

Using awk in terminal (In case you just need the list freshly out from cuff_diff without any R manipulation)

awk '$14=="yes"' diff_out/gene_exp.diff > diff_genes.txt

where diffout is again the output folder containing results of cuffdiff and geneexp.diff contains the list of genes tested for DE. In most cases the 14th column is the column which says the gene is significantly expressed or not, if you have some other column replace the number 14 by that.

If just interested in number of DE genes, then

awk '$14=="yes"' diff_out/gene_exp.diff | wc -l

Cheers

ADD COMMENT
0
Entering edit mode

Sorry,

How I can extract columns 2 and 3 if only the column 14(significant) is yes and only between samples C1 and C2 because I have another samples in lower rows and put the result separately for which column 10 <0 and another folder for which column 10 > 0

Thank you so much

ADD REPLY
1
Entering edit mode
sigGenes=subset(diffData(genes(cuff)),(significant=='yes'))‚Äč

This gives you sigGenes as a dataframe, you can now subset it to anything you like. I don't understand your question completely, but you can subset it sample names C1/C2 etc by

subGens=subset(sigGenes,sample_1=="C1" & sample_2=="C2")
ADD REPLY
0
Entering edit mode

I am thankful

May you please tell me these in unix code

Thanks again

ADD REPLY
1
Entering edit mode
awk '$14=="yes"' gene_exp.diff > sigGenes
awk '$2=="C1" && $3=="C2"' sigGens > subGenes

or combining them

awk '$2=="C1" && $3=="C2" && $14=="yes"' gene_exp.diff > subGenes
ADD REPLY
0
Entering edit mode

thank you very much

ADD REPLY

Login before adding your answer.

Traffic: 1413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6