Hello, I want to map reads from Tara Ocean https://www.ebi.ac.uk/metagenomics/projects/ERP001736. To some specific genes extracted from the corresponding contigs (http://www.ebi.ac.uk/ena/about/tara-oceans-assemblies ), in order to get the specific abundance for those genes. Can I map DNA-seq reads against a small group of genes (or i must used the whole contig) and how can i do that ?
Any suggestions are appreciated.
Thanks. I know in which assembly files my genes are. My problem is that I'm interested by abundance of a small fraction of genes so I wanted to avoid mapping all reads back to the contig.
Even if you extracted the genes you need from that one assembly (or use the whole assembly) you would still need to align the source reads (240+) at the first link. How can you avoid that mapping step?
Even though this is not RNAseq data I wonder if you could use kallisto/salmon to speed the process up significantly. See this blog post.
I can't avoid the mapping step. I mean avoiding mapping reads against whole contigs by mapping against a set of genes of interest. Thanks, I will take a look at the blog post.
That you can certainly do. Subset the genes and try BBMap. I find it plenty fast for alignments and it will multi-thread (if kallisto/salmon does not work).
Interesting suggestion, although I haven't tried it for this type of experiment I assume kallisto/salmon should work if you create your own reference just containing the sequences of interest. Not sure about the specificity, probably same problem as aligning to a limited reference might give spurious alignments since the aligners try too hard.