parameters to DESeq2 for RNA-seq data
0
0
Entering edit mode
5.4 years ago

Hi all (:

I'm a beginner in bioinformatics and could use some help in RNA-seq analysis i'm working on.

I want to find differentially expressed genes between a cell line and its mutant (single end). I used DESeq2 and received almost 8000 genes, which I think is too much, and I wonder whether I should define more parameters in order to get a smaller group of genes. I know there are many statistics (and other types of) parameters that can change the output, which of them should i use to get reliable results?

the commands I used:

de_obj=DESeqDataSetFromMatrix(countData=my_counts,colData=coldata,design= ~condition)
de_obj=DESeq(de_obj, test="Wald")
result=results(de_obj)

thanks !

rna-seq DESeq2 • 1.5k views
ADD COMMENT
2
Entering edit mode

Hey, please show all commands that you used, even those commands outside of DESeq2 where you may have performed alignment / pseudo-alignment and read count abundance. Also, please tell us your sample size per each group.

Note that the basic way to do a pairwise comparison is:

dds <- DESeq(dds)
res <- results(dds, contrast=c("CellLine", "Mutant", "WT"), independentFiltering=TRUE, alpha=0.0.05, pAdjustMethod="BH", parallel=TRUE)
res <- lfcShrink(dds, contrast=c("CellLine", "Mutant", "WT"), res=res)

Here, CellLine is a column in your metadata, which contains two factors: Mutant and WT. Here we also perform log fold-change shrinkage, recently introduced in DESeq2.

ADD REPLY
1
Entering edit mode

Please also give some details on how the cell lines differ. If one expresses any potent (or additional) oncogene, or something that interferes with core cellular processes like transcription or translation, it might not be too unexpected that genes in the order of several thousands change.

ADD REPLY
0
Entering edit mode

These are A2780 and A2780cis. A2780cis cell line is resistant to cisplatin-it's in fact A2780 cells that grew with increasing dose of cisplatin and developed some resistance for it. more info: https://www.sigmaaldrich.com/catalog/product/sigma/cb_93112517?lang=enĀ®ion=IL

ADD REPLY
0
Entering edit mode

First, thanks for the help,

*note- I confused and its a paired end sequencing

the former commands:

trimming-
trimmomatic-0.38.jar PE -phred33 -threads 16 -trimlog trimmomatic.log rawdata.fastq trimmed.fastq ILLUMINACLIP:adaptors_ilumina.txt:2:30:10:2:true LEADING:15 TRAILING:15 SLIDINGWINDOW:4:15 MINLEN:36 CROP:75
alignment-
STAR --runThreadN=16 --runMode alignReads --genomeDir genome_dir_path/ --readFilesIn trimmed_1P.fastq trimmed_2P.fastq"` --alignIntronMax=500000 --alignMatesGapMax 500000 --outSAMtype SAM  --outSAMprimaryFlag OneBestScore --outFilterMultimapNmax 100 --outFilterMismatchNmax 2 --alignSJstitchMismatchNmax 5 -1 5 5
unique filtering-
PicardCommandLine MarkDuplicates READ_NAME_REGEX=null REMOVE_DUPLICATES=true I=star_output.bam O=unique.bam M=marked_metrics.txt

I have three replicate for each WT and mutant cell lines. counts after trimming:

WT: 15000022, 10103016 and 11547756 reads
mutant: 12237637, 11058519 and 13781710 reads
ADD REPLY
0
Entering edit mode

Ah, after STAR, you may want to perform the read count abundance over your transcripts of interest using, for example, featureCounts ( http://bioinf.wehi.edu.au/featureCounts/ ).

ADD REPLY

Login before adding your answer.

Traffic: 2169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6