Question: Kraken2 to Phyloseq
1
gravatar for c.e.chong
5 months ago by
c.e.chong20
c.e.chong20 wrote:

Hi all,

I am quite new to Metagenomics and how to statistically analyse my data.

I have run kraken2 to taxonomically profile my assembled metagenomics samples. I have three different disease state groups and I would like to see if there are any statistical differences between them. I decided to try to run metagenomeseq and/or phyloseq, however I am unsure of how to go from my kraken reports to inputting this into R.

I thought to create biom-tables with the program kraken-biom, but I am unsure if I should create one table per group or one table per sample.

Any information anyone has on Metagenome stats and using metagenomeseq/phyloseq I'd be grateful for your help!

Thanks!

ADD COMMENTlink modified 5 months ago by Asaf6.5k • written 5 months ago by c.e.chong20
2
gravatar for Asaf
5 months ago by
Asaf6.5k
Israel
Asaf6.5k wrote:

You'll be losing data if you will only work with contigs mapped by kraken or working with kraken assignments only. I think that the comparison should be done between the number of reads mapped to each contig (or contig bins) then finding differential contigs and then trying to figure out what they are using kraken (or protein composition etc.).

ADD COMMENTlink written 5 months ago by Asaf6.5k

I have previously mapped my reads to databases made from the contigs of each sample when using anvio to bin my samples. Would these bam files contain the sufficient information to do the comparisons? To use kraken to find out what the differential contigs are should I input the concoct contig bins I have created into kraken?

Thank you very much for your help!

ADD REPLYlink written 5 months ago by c.e.chong20
1

You can generate a table of the number of reads mapped to each contig from your bam file, then you can sum them up for each bin and use these counts for comparison (with DESeq for instance). Since you have bins I would take other approaches for determining taxonomy, gtdb-tk for instance which is fast but more accurate (a larger DB and more refined method).

ADD REPLYlink written 5 months ago by Asaf6.5k

Thank you for your reply, is there a preferred method for generating a table of the number of reads mapped to each contig from the bam file? Is this the same as calculating coverage?

Also is DeSeq recommended to find taxonomic differences between samples/groups of samples? Or more the genes within samples?

ADD REPLYlink modified 5 months ago • written 5 months ago by c.e.chong20
1

I'm using samtools idxstats <file.bam> | awk '{print $1"\t"int(($3+$4)/2)}' to get the table for each bam file. Using DESeq to compare samples at the gene level will leave you with a long list of highly dependent features compared between samples, I would compare contigs (or bins) and then figure out what's in the differential ones.

ADD REPLYlink written 5 months ago by Asaf6.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 778 users visited in the last hour