Hi, I have a metagenomics sample. I also have a set of reads from a genome. How can I pull similar or identical reads from the metagenomics sample? My ultimate goal is to reconstruct the genome hidden in the metagenomics sample.
Hi, I have a metagenomics sample. I also have a set of reads from a genome. How can I pull similar or identical reads from the metagenomics sample? My ultimate goal is to reconstruct the genome hidden in the metagenomics sample.
I use bwa
to index the set of reads, then I align metagenomic sample to that reference - this will yield the "similar or identical reads".
How many are the reads from the genome that you already have?
If they are enough, you coud assemble them in contigs (or even to rough scaffolds of a genome) and then align against them.
If you have too few, the you could create a blast database or an aligner index and search against that.
Cheers,
IV
As I and Pavel already mentioned, you can create a bowtie, bwa or any other aligner index or blast database by using your available reads. You should at first collapse the reads and transform them into a fasta file. You can use this fasta to create your index or as the base of your blast database. You can use that index or that blast database to search against.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I know that making an assembly and mapping would be one way to do it. This is a legitimate way but there will be some unassembled regions of the genome. What I'm asking is, is there a way to match reads to reads?
What exactly you would assemble, metagenome, or genome? If latter, then you will reduce the search space from a set of reads to contigs and singletons (= the index size) and speed-up the selection process. I am interested, why
bwa
wouldn't work? I use this solution in the current project and it works.After getting all the reads, I would assemble the genome hidden in the metagenome. The reason I don't want to map to an assembly is because some places in the assembly are not retained. Usually either due to repeats, misassemblies, or lack of coverage. I am willing to spend extra computer time on this in order to recover more or better reads.
Some good verbal suggestions to me have been either 1) find reads with identical kmers or 2) blast metagenomics reads vs genomics reads to recover any queries that match.