Question: Code for DEG analysis using Ballgown
0
gravatar for Arindam Ghosh
2.2 years ago by
Arindam Ghosh250
India
Arindam Ghosh250 wrote:

I am using BALLGOWN for DEG analysis of RNA-seq data obtained from ENA. During analysis what is the correct way to give pheno_data. Do it have to be in any particular order?

I have set it as:

"ids","cell","rep"

"SRR4011890","hESC","rep1"

"SRR4011891","hESC","rep2"

"SRR4011896","hESC-CM","rep1"

"SRR4011897","hESC-CM","rep2"

"SRR4011898","hESC","rep3"

"SRR4011899","hESC","rep4"

"SRR4011904","hESC-CM","rep3"

"SRR4011905","hESC-CM","rep4"

After this I am using the following code:

library(ballgown)

library(genefilter)

library(dplyr)

library(devtools)

pheno_data = read.csv("compare.csv")

bg = ballgown (dataDir = "/home/bioinfo/Documents/Work/Data/PRJNA338181/expression", samplePattern = "SRR", pData = pheno_data)

bg_table = texpr (bg, 'all')

bg_table

bg_gene_name = unique (bg_table[,9:10])

save (bg, file = 'bg.rda')

result_transcripts = stattest(bg, feature = "transcript", covariate = "cell", getFC = TRUE, meas = "FPKM")

result_genes = stattest(bg, feature = "gene", covariate = "cell", getFC = TRUE, meas = "FPKM")

result_genes = merge (result_genes, bg_gene_name, by.x = c("id"), by.y = c ("gene_id"))

write.table (result_transcripts, "compare_transcript_resuls.tsv", sep = "\t", quote = FALSE, row.names = FALSE)

write.table (result_genes, "compare_genes_result.tsv", sep = "\t", quote = FALSE, row.names = FALSE)

hist(result_genes$pval, main="Histogram of p-values before filter", xlab="p-val")

hist(result_genes$qval, main="Histogram of q-values before filter", xlab="q-val")

bg_filt = subset (bg, "rowVars(texpr(bg))>1", genomesubset = TRUE)

bg_filt_table = texpr (bg_filt, 'all') bg_filt_gene_names = unique (bg_filt_table [,9:10])

result_transcripts = stattest(bg_filt, feature = "transcript", covariate = "cell", getFC = TRUE, meas = "FPKM")

result_genes = stattest(bg_filt, feature = "gene", covariate = "cell", getFC = TRUE, meas = "FPKM")

result_genes = merge (result_genes, bg_filt_gene_names, by.x = c("id"), by.y = c ("gene_id"))

write.table (result_transcripts, "compare_transcript_result_filt.tsv", sep = "\t", quote = FALSE, row.names = FALSE)

write.table (result_genes, "compare_genes_result_filt.tsv", sep = "\t", quote = FALSE, row.names = FALSE)

sig_p_transcripts = subset (result_transcripts, result_transcripts$pval<0.05)

sig_p_genes = subset (result_genes, result_genes$pval<0.05)

write.table (sig_p_transcripts, "compare_transcript_sigp_005.tsv", sep = "\t", quote = FALSE, row.names = FALSE)

write.table (sig_p_genes, "compare_gene_sigp_005.tsv", sep = "\t", quote = FALSE, row.names = FALSE)

Can anyone help with the R script that I am using?

I am asking this question because on changing the names of data sets and ordering the phenodata content by cell type gives a different result.

rna-seq ngs stattest ballgown • 1.5k views
ADD COMMENTlink modified 2.1 years ago by lakhujanivijay5.0k • written 2.2 years ago by Arindam Ghosh250
0
gravatar for lakhujanivijay
2.1 years ago by
lakhujanivijay5.0k
India
lakhujanivijay5.0k wrote:

Of course the results are going to differ

result_genes = stattest(bg_filt, feature = "gene", covariate = "cell", getFC = TRUE, meas = "FPKM")

See the covariate option.

ADD COMMENTlink written 2.1 years ago by lakhujanivijay5.0k

Can you please elaborate?

ADD REPLYlink written 2.1 years ago by Arindam Ghosh250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 921 users visited in the last hour