Question: miRNA differential expression
2
gravatar for anshulmbi
17 months ago by
anshulmbi30
anshulmbi30 wrote:

Hello,

I am doing miRNA differential expression analysis from the Read counts, but I have few questions: 1. Which one is better DEseq2 and EdgeR 2. I have total 452 samples with 4 condition and each condition have different sample no. How to analyze this type of data. 3. I want to check the expression of miRNAs for the individual condition, I mean which miRNAs are Down or UP-regulated in which condition.

Please help me in this regard.

Thanks...

mirna rna-seq deseq2 de • 1.6k views
ADD COMMENTlink modified 17 months ago • written 17 months ago by anshulmbi30

Hello All, Thank you for your kind reply and suggestions. Please focus on my second problem: I have total 452 samples with 4 condition and each condition have different sample no. How to analyze this type of data. I checked DeSeq2 and EdgeR tutorials, mostly in these tutorials they did only DE analysis only for control and treatment condition. But In my case, I have 4 conditions with different sample no. eg.: Condition 1: 139 samples Condition 2: 109 samples Condition 3: 89 samples Condition 4: 80 samples I hope you can understand my problem. Thanks!!

ADD REPLYlink written 17 months ago by anshulmbi30

First, be sure to add your emphasis to the second issue as a comment rather than an answer to your post.

DESeq2 does accommodate analysis with different samples numbers, thus I am not sure why this is a problem. It seems to me that the sample number is not the issue your are pointing to, but rather the fact that you have 4 conditions. The way you will analyze this will depend upon the question of you experimental problem. You could analyze all the groups together (accounting for multiple groups in the design section of DESeq2) and use contrast to the comparisons that you are interested (more on this here). Alternatively, if your conditions represent something like different time points, you should consider reading that section on the DEseq2 vignette.

ADD REPLYlink written 17 months ago by lshepard420
3
gravatar for ATpoint
17 months ago by
ATpoint36k
Germany
ATpoint36k wrote:

Please browse this forum using the search function and scan the literature for miRNA pipelines and comparisons between edgeR and DESeq2. Short answer, each of the two is valid and established and there is no simple "better" or "worse". Be sure to use proper alignment settings. Use something like bowtie and align against a database such as miRBase, not against the genome and be sure to properly trim your reads. As said, read previous posts, this has been discussed many times before.

ADD COMMENTlink modified 17 months ago • written 17 months ago by ATpoint36k

"align against a database such as miRBase, not against the genome"

Any specific reason for this? We can align to reference genome (eg with Hisat2) and then extract only counts of miRNA (eg featureCounts) based on GTF annotation files (eg Ensembl).

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Arindam Ghosh300
1

miRNAs are short (~20bp) therefore aligning against the genome will give plenty of multimappers just by chance, that is at least what I took from questions towards miRNAs I rad here on biostars. Not a miRNA expert myself. Therefore aligning against a dedicated database might be better.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by ATpoint36k

That's a reasonably valid logic. I tried aligning with HiSat2 to the Ensembl reference genome and observed that overall alignment rate is ~60-70% with 40-50% unique alignment.

On a similar note, how do you deal with mRNA reads multi-mapping to different positions in the reference genome? Count all or ignore all? We can always avoid this by aligning only to known transcripts.

How about letting the reads align to where ever it can and then only the gene we require? Like mRNA from certain genes may may to pseudogenes, but while counting if we consider only genes and not pseudogenes may make sense.

ADD REPLYlink written 6 weeks ago by Arindam Ghosh300
1

mRNA reads are typically discarded if they multimap. Tools like salmon which perform pseudo- or selective alignment against the transcriptome have a more elaborate strategy to deal with multimappers but I do not recall the principle. Check the salmon (Patro et al) paper for details if you want.

but while counting if we consider only genes and not pseudogenes may make sense.

You cannot cherrypick during alignment. If you only count non-pseudogenes, but the count in reality come from the pseudogene while maybe the non-pseudogene has count 0, then you get many false positives. If you have multimapping then it is what it is, you cannot confidently say where the reads come from.

ADD REPLYlink written 6 weeks ago by ATpoint36k
3
gravatar for biobiu
17 months ago by
biobiu120
United States
biobiu120 wrote:

I agree with ATpoint- there is no easy answer for which of them is better. Two additional comments:

1) It is highly important to do pre-processing for miRNA-seq reads (removing adapters, low-quality sequences, size enrichment).

2) As for the alignment- there are several aligners and methods the can be used (I prefer aligning to the genome and the take only reads that were mapped to miRNAs). Consider reading this

ADD COMMENTlink modified 17 months ago • written 17 months ago by biobiu120
2
gravatar for cfos4698
17 months ago by
cfos4698170
cfos4698170 wrote:

For point 1: As ATpoint said, one method is not better than the other - each has its own merits. As tough as it is when first starting out, you need to read through the assumptions of the methods and make a judgement call as to which is most appropriate for your data. It's helpful to read through the papers describing the methods, as well as their various vignettes (such as those on Bioconductor), and general rna-seq workflow (e.g. http://master.bioconductor.org/packages/release/workflows/html/rnaseqGene.html). I personally like to use DESeq2, and the DESeq2 vignette is particularly helpful and easy to follow.

For the other two points, the various vignettes will tell you how to do this. You will need to get an estimate of the read counts for transcripts/genes, then compare these among your conditions. Depending on your thresholds for differential expression, you will get a list of differentially expressed genes. Those that are upregulated in a condition will have positive fold changes relative to the contrast group, those that are downregulated will will have negative fold changes relative to the contrast group. Again, this will all become more clear after reading vignettes and following tutorials.

ADD COMMENTlink written 17 months ago by cfos4698170
1
gravatar for nsmi8446
17 months ago by
nsmi8446120
nsmi8446120 wrote:

This site might be helpful for you, it gives an accessible introduction to this type of analysis:

https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/srna/tutorial.html

ADD COMMENTlink modified 17 months ago • written 17 months ago by nsmi8446120
0
gravatar for v82masae
17 months ago by
v82masae140
v82masae140 wrote:

Unless your targeted organism has very poor sequenced genome, I would strongly suggest to map your smallRNA-seq to the reference genome, and then count aligned reads by using miRNA GFF or GTF positions.

About what software to use for Differential Expression. EdgeR or DESeq2 ar both good options, and should give you similar results.

Be aware of processing and alignment steps. As ATpoint and biobiu suggested, better taking some time to have properly clean smallRNAseq data to start further steps. I use bowtie aligner with following options: bowtie -n 1 -l 10 -m 100 -k 1 --best --strata, i.e. allowing one mismatch in alignment seed with a 10 nucleotides length, removing reads with more than 100 putative mapping sites and reporting first single best stratum alignment. Mismatches can range from 0 to 2-3, at maximum, I would say.

ADD COMMENTlink written 17 months ago by v82masae140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1708 users visited in the last hour