I am performing a method to assign reads to enzymes based on read vs gene sequence alignment.
I know average gene size is about to 1000 nucleotides, and my read's length is 100, so if I get hits in three different positions of a gene isn't good to say that this gene is present 3 times, cause maybe is the same gene fragmented three times.
However, I believe that a normalization could be done through information about taxonomic marker genes. For example: If if found that 16s gene of bacillus subtillis is present 35 times and nirK hits for this species is equal to 3500. A relation 3500/35 can give me normalization of nirk genes present in the sample.
I have walked through literature but I find that taxonomic profiling tools report species in relative abundance, and it seems not useful for me.
Thank you for your help