Accounting for differences in FASTQ-file size, when comparing metagenomic gene abundance between samples
1
1
Entering edit mode
4.3 years ago
Hansen_869 ▴ 80

I have 8 count-matrix's (from bacterial metagenomic DNA sequencing), with information regarding fragment-count, meaning number of fragments aligned to each gene. I got the fragment-count, as opposed to read count, using Featurecounts. I have normalized for gene-length (longer genes will map more reads), by dividing the fragment count by the gene length. However, due to the variances in the size of the FASTQ-files, I wonder if i should normalize for that too somehow? My guess is that the bigger FASTQ files, will map more reads to the contigs, thus giving unequal numbers in regards to the samples with smaller FASTQ sizes. My final goal is to compare the gene abundances BETWEEN the 8 samples, so relative numbers are fine.

All 8 samples were sequenced equally and are coming from the same environment, but in different timepoints. But the FASTQ-files still vary in size by a couple of 100 MB.

gene metagenomics TPM Reads • 1.2k views
ADD COMMENT
0
Entering edit mode
4.3 years ago
tshtatland ▴ 190

I suggest to downsample all fastq files to the same number of reads prior to the analysis. This is a method commonly used in many other applications, such as RNA-seq and variant calling.

ADD COMMENT
0
Entering edit mode

Thanks for your response. I will look into that. Do you suggest i do any other form of normalisation? I read about TPM, RPKM and FPKM. Or do you think normalising for JUST gene length is sufficient in this type of study? In the mentioned techniques, READ length is taken into account, but due to the fact that the read length is the same for all the samples, i suppose it's redundant?

ADD REPLY
0
Entering edit mode

The rest of the normalization should be done as recommended in the metagenomics packages, which I assume depends on the package. Additional normalization for gene length using TPM, for example, still makes sense, even if you first downsample to the same number of reads.

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6