Question: How do you identify the contigs from trinity assembly?
0
gravatar for MAPK
9 months ago by
MAPK1.4k
United States
MAPK1.4k wrote:

I am trying to get the read counts for DESeq2 analysis from meta-genomic data. I have assembled contigs using Trinity for all organisms and I would like to map my reads for each sample to these contigs and get the read counts for DESeq2 analysis. Normally for RNAseq we would use GFF file to annotate the read and annotate as a loci, but for metagenomic data, I can't use one specific genome, so I wanted to use Trinity assembled contigs as reference for mapping. However, before proceeding with the read mapping, I would like to annotate each contigs from Trinity. I wonder if I can do BLAST search against nr. What would be the easiest way to do this? Thanks for your help!

assembly blast trinity • 605 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by MAPK1.4k
1

To get counts for each, you don't strictly need to identify them up-front. You could identify the DE ones first and only ID those :-)

You could follow these directions from Trinity for identification.

Edit: Since this is a metagenomic dataset these directions are not useful.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax69k

That is right, I was planning to do the way you have suggested, but then identifying the DE ones later would be a bit elaborate process. I thought identifying in the beginning would reduce the work later.

ADD REPLYlink written 9 months ago by MAPK1.4k

So rather than identification per se you are looking to reduce redundancy so you don't have the same sequence represented multiple times?

Did you use TriMetAss (http://microbiology.se/software/trimetass/ ) instead of Trinity? That appears to be for metagenomic data.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax69k

No, these are not overlapping sequences so I wanted to map them to the assembled reference. I haven't used TriMetAss, but will give it a try. Thanks!

ADD REPLYlink written 9 months ago by MAPK1.4k

Additionally, I just wanted to get the loci identified (as which gene,CDS etc) for each cluster of reads after mapping.

ADD REPLYlink written 9 months ago by MAPK1.4k
1

Since this is bacterial data you would expect the entire sequence to be coding. It may not be full length or start at the ATG depending on how well the assembly worked.

As suggested it should be ok to search using DIAMOND againsr nr (or RefSeq bacterial database) to identify the contigs. It works well but you would need ~80-100G of RAM for this search. You could also try magicblast from NCBI.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax69k

Thanks! I have used Diamond before so yes it makes sense.

ADD REPLYlink modified 9 months ago • written 9 months ago by MAPK1.4k
1

Out of sheer curiosity: What was your rationale to use trinity? My apologies in case this is question is merely based on my inexperience with trinity: Why would you blast contigs against nr? Or do you get proteins? Is trinity able to define gene boundaries in prokaryotic RNAseq data? Also I think your gff approach should work - you can handle contigs in a metagenome just like any other genome.

For contig annotation Kraken is an excellent tool (though lacks of a good taxonomic binning algorithm, afaik) and as a faster blastp alternative, I recommend diamond

ADD REPLYlink written 9 months ago by Carambakaracho1.4k

I just wanted to annotate the contigs and I also don't think BLAST would be the best solution and therefore I was asking this question here. Since it is a metatranscripome data, I am not sure if I would be able to use GFF file(s). I am using Trinity assembled data as a reference genome to get read counts from the metatranscriptome data I have.

ADD REPLYlink written 9 months ago by MAPK1.4k

Hi, I was just wondering if you ended up finding a way to annotate the contigs from Trinity?

ADD REPLYlink written 6 months ago by CC20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 973 users visited in the last hour