Entering edit mode
6.2 years ago
f.a.galkin
▴
40
After I map a metagenome against a DB, I don't get an abundance table per se. I need to somehow decide, how I treat non-unique mappings and normalise data.
Centrifuge suggests an EM-based method, but what are the alternatives? I can try basic RPKM that completely ignores the problem, or I can try to assign each read to one of its taxons randomly. Any other options?
Hi there,
Could you please elaborate what you mean by
which tools, databases and methods are being used at this stage? Is it read-based or kmer-based mapping?
K-mer based, Centrifuge, mapped onto their index of bacterial/archaeal genomes
Doesn't centrifuge output file
centrifuge_report.tsv
contain a measure of abundance? Did you look at the output fromcentrifuge-kreport
command?It does contain, but I am looking for alternatives, if there are any. Centrifuge's report assigns zero abundances too generously
I do not have an answer, but should you not only use certain markers to find the taxonomy? Markers like CO1, 16S, 18S, ITS etc. Because you say you the whole metagenome. There are genes that are in almost every organism so just take a random taxonomy would not be good.
Marker gene based method are biased and do not guarrantee detecting all the present organisms
"Marker gene based method are biased" True, if you do amplicon sequencing and a PCR. But you have a metagenome as I understand from your question. You can not just use any gene to find the taxonomy. Maybe metaphlan can help: http://huttenhower.sph.harvard.edu/metaphlan