Question: Normalizing gene counts in a quantitative metagenomic DNAseq analysis
0
gravatar for allerdrengen55
5 days ago by
allerdrengen5520 wrote:

Hi!

The purpose of my research, is to look at the gene abundance in a metagenomic sample (DNAseq). So it is not an expression analysis, since I am looking at genomic DNA. Furthermore, I am looking at different metagenomic samples, taken from different environments (specifically looking at bacteria). I want to compare the gene abundance between the samples (but also the genes to each other, within the sample). Hence, I need a proper way to normalize my genes. I expect a lot of the genes to be similar, but the abundance to differ (due to reduced/increased amount of bacteria in the sample). My data is an alignment of the RAW fastq-reads mapped to my genes. I used featurecounts to count fragments (I have paired-end data) instead of reads.

My concern is that the genes will differ in length, so the number of fragments aligned to each gene, won't tell me much about its abundance (a very long gene, will have lots of fragments mapped, but that doesn't necessarily mean that it has a high abundance).

My count matrix is of the format:

GENE_ID - GENE_FUNCTION - FRAGMENTS_MAPPED - LENGTH_OF_GENE

My first thought was to apply the following:

(number of fragments mapped) x (average fragment length) / (length of gene)

However, that was just a thought. I believe there must be more accurate ways? I have heard about MUSiCC and DESeq2, however, are those tools suitable for my kind of reseach?

Thanks a lot in advance!

depth count dna-seq illumina • 42 views
ADD COMMENTlink modified 5 days ago • written 5 days ago by allerdrengen5520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1062 users visited in the last hour