Question: Abundance and cover of single genes
0
gravatar for adp7
3.2 years ago by
adp710
France
adp710 wrote:

Hello, I want to map reads from Tara Ocean https://www.ebi.ac.uk/metagenomics/projects/ERP001736. To some specific genes extracted from the corresponding contigs (http://www.ebi.ac.uk/ena/about/tara-oceans-assemblies ), in order to get the specific abundance for those genes. Can I map DNA-seq reads against a small group of genes (or i must used the whole contig) and how can i do that ?

Any suggestions are appreciated.

ADD COMMENTlink modified 3.2 years ago by genomax70k • written 3.2 years ago by adp710
1
gravatar for genomax
3.2 years ago by
genomax70k
United States
genomax70k wrote:

When you choose to map data to a subset (of the whole dataset) there is some risk that aligners will try to align reads in places where they do not belong. If you are willing to accept that risk you should be able to do what you want to do.

Fishing out genes that you need out of these assemblies may be a big task (I don't know for sure). There seem to be plenty of files at the assembly page you linked above).

ADD COMMENTlink written 3.2 years ago by genomax70k

Thanks. I know in which assembly files my genes are. My problem is that I'm interested by abundance of a small fraction of genes so I wanted to avoid mapping all reads back to the contig.

ADD REPLYlink written 3.2 years ago by adp710

Even if you extracted the genes you need from that one assembly (or use the whole assembly) you would still need to align the source reads (240+) at the first link. How can you avoid that mapping step?

Even though this is not RNAseq data I wonder if you could use kallisto/salmon to speed the process up significantly. See this blog post.

ADD REPLYlink written 3.2 years ago by genomax70k

I can't avoid the mapping step. I mean avoiding mapping reads against whole contigs by mapping against a set of genes of interest. Thanks, I will take a look at the blog post.

ADD REPLYlink written 3.2 years ago by adp710

That you can certainly do. Subset the genes and try BBMap. I find it plenty fast for alignments and it will multi-thread (if kallisto/salmon does not work).

ADD REPLYlink written 3.2 years ago by genomax70k

Interesting suggestion, although I haven't tried it for this type of experiment I assume kallisto/salmon should work if you create your own reference just containing the sequences of interest. Not sure about the specificity, probably same problem as aligning to a limited reference might give spurious alignments since the aligners try too hard.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 682 users visited in the last hour