I have 8 count-matrix's (from bacterial metagenomic DNA sequencing), with information regarding fragment-count, meaning number of fragments aligned to each gene. I got the fragment-count, as opposed to read count, using Featurecounts. I have normalized for gene-length (longer genes will map more reads), by dividing the fragment count by the gene length. However, due to the variances in the size of the FASTQ-files, I wonder if i should normalize for that too somehow? My guess is that the bigger FASTQ files, will map more reads to the contigs, thus giving unequal numbers in regards to the samples with smaller FASTQ sizes. My final goal is to compare the gene abundances BETWEEN the 8 samples, so relative numbers are fine.
All 8 samples were sequenced equally and are coming from the same environment, but in different timepoints. But the FASTQ-files still vary in size by a couple of 100 MB.