Question

Abundance and cover of single genes

0

Entering edit mode

7.9 years ago

adp7 ▴ 10

Hello, I want to map reads from Tara Ocean https://www.ebi.ac.uk/metagenomics/projects/ERP001736. To some specific genes extracted from the corresponding contigs (http://www.ebi.ac.uk/ena/about/tara-oceans-assemblies ), in order to get the specific abundance for those genes. Can I map DNA-seq reads against a small group of genes (or i must used the whole contig) and how can i do that ?

Any suggestions are appreciated.

genome metagenomics reads sequence • 1.6k views

ADD COMMENT • link updated 7.9 years ago by GenoMax 142k • written 7.9 years ago by adp7 ▴ 10

score 1 · Answer 1 · 2016-06-03

1

Entering edit mode

7.9 years ago

GenoMax 142k

When you choose to map data to a subset (of the whole dataset) there is some risk that aligners will try to align reads in places where they do not belong. If you are willing to accept that risk you should be able to do what you want to do.

Fishing out genes that you need out of these assemblies may be a big task (I don't know for sure). There seem to be plenty of files at the assembly page you linked above).

ADD COMMENT • link 7.9 years ago by GenoMax 142k

0

Entering edit mode

Thanks. I know in which assembly files my genes are. My problem is that I'm interested by abundance of a small fraction of genes so I wanted to avoid mapping all reads back to the contig.

ADD REPLY • link 7.9 years ago by adp7 ▴ 10

0

Entering edit mode

Even if you extracted the genes you need from that one assembly (or use the whole assembly) you would still need to align the source reads (240+) at the first link. How can you avoid that mapping step?

Even though this is not RNAseq data I wonder if you could use kallisto/salmon to speed the process up significantly. See this blog post.

ADD REPLY • link 7.9 years ago by GenoMax 142k

0

Entering edit mode

I can't avoid the mapping step. I mean avoiding mapping reads against whole contigs by mapping against a set of genes of interest. Thanks, I will take a look at the blog post.

ADD REPLY • link 7.9 years ago by adp7 ▴ 10

0

Entering edit mode

That you can certainly do. Subset the genes and try BBMap. I find it plenty fast for alignments and it will multi-thread (if kallisto/salmon does not work).

ADD REPLY • link 7.9 years ago by GenoMax 142k

0

Entering edit mode

Interesting suggestion, although I haven't tried it for this type of experiment I assume kallisto/salmon should work if you create your own reference just containing the sequences of interest. Not sure about the specificity, probably same problem as aligning to a limited reference might give spurious alignments since the aligners try too hard.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k