Removal of host reads from shotgun metagenomics data.
3
0
Entering edit mode
12 months ago

The presence of host sequence in shotgun metagenome data is considered as contamination and it is removed by mapping with a reference genome in different tools like bowtie2. Can anyone kindly through some light on how to create the reference genome when the whole genome sequence data is not available for the host organism? I am currently working on the gut microbiota of a lepidopteran species and its whole genome is not sequenced yet.

Any response is much appreciated.

metagenomics shotgun • 751 views
0
Entering edit mode

1
Entering edit mode
12 months ago
GenoMax 119k

You should use a tool like bbsplit.sh to bin the reads. (LINK).

gut microbiota of a lepidopteran organism and its whole genome is not sequenced yet.

That is going to limit what you can identify and separate for host genome.

1
Entering edit mode
12 months ago
colindaven ★ 4.0k

You should just be able to use cat *.fa > draft_genome.fa to combine the contigs to a draft genome. Make sure you check it's usefulness using samtools faidx x.fa . Also check whole genome size is as expected.

Then as GenoMax says you can use bbsplit.sh to proceed.

1
Entering edit mode
12 months ago
Mensur Dlakic ★ 20k

You can remove the host contigs after the assembly, and that way you will get to assemble some of the insect genome. It should be plenty different from whatever is in its gut, and I expect it will easily separate when you bin the contigs based on tetranucleotide frequencies. The only downside is that it may make the assembly difficult depending on the proportion of host / metagenomic reads.