Problem with logFC values
0
0
Entering edit mode
8.1 years ago

Dear friends,

I am basically a botanist and new to NGS. And I am sorry if any similar question was discussed before.

Comparing Resistant and susceptible plant genotypes infected with Fusarium (Fungi)

exp_study = DGEList(counts=rnaseqMatrix, group=conditions)
exp_study = calcNormFactors(exp_study)
et = exactTest(exp_study, dispersion=0.1)
tTags = as.data.frame(topTags(et,n=NULL))


I wonder if there is any problem in my input file containing the counts. I find that there are many 0 in the counts.

Is it a problem? How the logFC is calculated in pairwise comparison, if any one of the count is 0?

EdgeR logFC next-gen R RNA-Seq • 3.1k views
2
Entering edit mode

I believe the logFC value is calculated on the fitted model coefficient, rather than on the raw data

1
Entering edit mode

Stated differently, the logFC is the fit coefficient, possible with a prior applied.

0
Entering edit mode

The outsourced sequencing company provided me 2 types of files after the Abundance estimation using RSEM.

(i have 8 pools/different combinations). So what they have done is they find the abundance estimation of each pools separately which gives different headers such as gene_id,len,effect_length,expected counts,etc)

Then they merged the column expected_counts from all the pools.

So they gave me isoform.results and genes.results.

I proceeded with the isoform.results

head (isoform.results)

Pool1       Pool2      Pool3       Pool4     Pool5  ...
comp0_c0_seq1                0           0        0.30        0.11      0.00
comp1_c0_seq1                0           0        0.86        0.00      0.00
comp100_c0_seq1              0           0        0.00        0.00      0.00


I would like to know whether the input I am using for logFC calculation is a good one?

How the fit model coefficient method is calculated and please suggest me the input to it.

Thanks and I'm afraid if my explanation is clear.

Good night!

1
Entering edit mode

Ah, the RSEM results (isoform.results) aren't used by edgeR at all. While it may be the case that you have 0 counts in the data used in edgeR, it may still not yield the expected infinite logFC (I'd have to look up whether edgeR constrains the optimization, shrinks the fold-changes a la DESeq2, or does something else entirely) even with all samples in 1 group having 0 counts.

0
Entering edit mode

Dear Dr. Devon

Thank you for the reply. Oh :(

So I'm in wrong way. Thanks a lot for the guidance.

Kindly let me know, how I can proceed with my datasets available.

Thanks again!

Sayuj

1
Entering edit mode

It'd help if you described the biological experiment (how it was structured, how many samples you have, what species you're using, etc.) and what has been done so far in terms of data processing (e.g. reads were aligned with tophat and the resulting BAM file was processed with RSEM ...).

0
Entering edit mode

Dear Dr.Devon

I am working on the interaction b/w Orchid and Fusarium

We have 2 types of Fusarium (Pathogenic (causes root rot of plant) and Non pathogenic)

Plant type: Resistant and susceptible

Pools used (combinations)

Pool 1: Susceptible plant vs Pathogenic fungus (we inoculated healthy plant with fungus): Disease is induced

Pool 2; Resistant plant vs Pathogenic fungus (we inoculated healthy plant with fungus): No disease

.........

Pool 8: Control Resistant plant (no fungus): No inoculation

Now work sheet provided by sequencing company

• RNAseq de novo transcriptome assembly
• 8x Illumina RNAseq library construction
• 1x HiSeq 2000 sequencing lane equivalent (2x100bp)
• 8 libraries sequenced
• Trinity's output was used as the reference.
• The primary RNAseq analysis consist of
• per sample paired-end mapping with bowtie
• conversion into per sample BAM files
• per sample transcript abundance with RSEM

Reads from each sample were independently mapped on the reference transcriptome using bowtie (RSEM driven).

Now, I performed the EdgeR analysis with the output file provided by RSEM output as mentioned in the previous message.(I started working the first 2 pools in extreme interest) I am interested to find the defense genes in plants involved in resistance mechanism.

Hope this is clear!

Thanks a lot again;

Sayuj

1
Entering edit mode

Ah, that clears things up. What I would recommend is taking the estimated counts from RSEM and using them with limma (first processing them with the voom() function). This is a pretty reasonable method and is similar to what you would use for TCGA data (where all you can download is the output from RSEM). Have a look through the limma tutorial, it's a very powerful (though sometimes difficult to use) package. You can start with the same first two datasets and, once you have that figured out, then move to working on all of the datasets at once and using contrasts for specific questions.

0
Entering edit mode

Ok sir, Thanks for the points.

I will let you know, once I handle the data with limma.

Sayuj

0
Entering edit mode

Thanks Russ

I would also like to clarify my input file used for the logFC calculation, as mentioned below for Dr. Devon Ryan

Thanks a lot

Sayuj