Question: How do you identify the contigs from trinity assembly?
0
gravatar for Genetics
13 months ago by
Genetics1.5k
United States
Genetics1.5k wrote:

I am trying to get the read counts for DESeq2 analysis from meta-genomic data. I have assembled contigs using Trinity for all organisms and I would like to map my reads for each sample to these contigs and get the read counts for DESeq2 analysis. Normally for RNAseq we would use GFF file to annotate the read and annotate as a loci, but for metagenomic data, I can't use one specific genome, so I wanted to use Trinity assembled contigs as reference for mapping. However, before proceeding with the read mapping, I would like to annotate each contigs from Trinity. I wonder if I can do BLAST search against nr. What would be the easiest way to do this? Thanks for your help!

assembly blast trinity • 712 views
ADD COMMENTlink modified 13 months ago • written 13 months ago by Genetics1.5k
1

To get counts for each, you don't strictly need to identify them up-front. You could identify the DE ones first and only ID those :-)

You could follow these directions from Trinity for identification.

Edit: Since this is a metagenomic dataset these directions are not useful.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax74k

That is right, I was planning to do the way you have suggested, but then identifying the DE ones later would be a bit elaborate process. I thought identifying in the beginning would reduce the work later.

ADD REPLYlink written 13 months ago by Genetics1.5k

So rather than identification per se you are looking to reduce redundancy so you don't have the same sequence represented multiple times?

Did you use TriMetAss (http://microbiology.se/software/trimetass/ ) instead of Trinity? That appears to be for metagenomic data.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax74k

No, these are not overlapping sequences so I wanted to map them to the assembled reference. I haven't used TriMetAss, but will give it a try. Thanks!

ADD REPLYlink written 13 months ago by Genetics1.5k

Additionally, I just wanted to get the loci identified (as which gene,CDS etc) for each cluster of reads after mapping.

ADD REPLYlink written 13 months ago by Genetics1.5k
1

Since this is bacterial data you would expect the entire sequence to be coding. It may not be full length or start at the ATG depending on how well the assembly worked.

As suggested it should be ok to search using DIAMOND againsr nr (or RefSeq bacterial database) to identify the contigs. It works well but you would need ~80-100G of RAM for this search. You could also try magicblast from NCBI.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax74k

Thanks! I have used Diamond before so yes it makes sense.

ADD REPLYlink modified 13 months ago • written 13 months ago by Genetics1.5k
1

Out of sheer curiosity: What was your rationale to use trinity? My apologies in case this is question is merely based on my inexperience with trinity: Why would you blast contigs against nr? Or do you get proteins? Is trinity able to define gene boundaries in prokaryotic RNAseq data? Also I think your gff approach should work - you can handle contigs in a metagenome just like any other genome.

For contig annotation Kraken is an excellent tool (though lacks of a good taxonomic binning algorithm, afaik) and as a faster blastp alternative, I recommend diamond

ADD REPLYlink written 13 months ago by Carambakaracho1.9k

I just wanted to annotate the contigs and I also don't think BLAST would be the best solution and therefore I was asking this question here. Since it is a metatranscripome data, I am not sure if I would be able to use GFF file(s). I am using Trinity assembled data as a reference genome to get read counts from the metatranscriptome data I have.

ADD REPLYlink written 13 months ago by Genetics1.5k

Hi, I was just wondering if you ended up finding a way to annotate the contigs from Trinity?

ADD REPLYlink written 9 months ago by CC30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1802 users visited in the last hour