Question: blast x results - some making no sense
gravatar for Biogeek
4.5 years ago by
Biogeek400 wrote:

Hi friends,

I'm annotating a higher plant which is well into the higher plant lineage, it is a pond species. I've conducted a BlastX on the uniprot viridiplantae database locally and I'm getting a lot of hits for basal lineages such as the unicellular green algae. The results to me are not making much sense - given the evolutionary advancement of this plant compared to lower plants. If I look down my next best hits vice versa, there is always a more appropriate hit.

As a result of the above, I then performed a blastX with just the embryophyte lineage of uniprot and trembl (land plants and aquatic plants) and the results make much more since; however, the % identity score is low in some transcripts.What are people's takes on this? Is it better to use a more specific database in such a case, given that the evolution of my non-model organism is clear to me, and is highly evolved on a much more distant branch of the higher plants, away from the early branch of the viridiplantae? OR do I just use the entire viridiplantae lineage?

I've additionally done a BlastN to detect any contamination; with a 95% cut-off and low e-value. The database for this is made for unicellular microbial algae and eukaryotes which are appearing in the first set of blast X hits. There was very little hits for this, of which I removed.

What are people's opinions? Thanks.

blastx • 1.1k views
ADD COMMENTlink modified 4.5 years ago by Chris Fields2.1k • written 4.5 years ago by Biogeek400
gravatar for Chris Fields
4.5 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

I'm guessing this is from an assembly; is it a transcriptome or full genome?

It might be worth doing an overall non-biased analysis, maybe something like a blobplot against a larger database, just to see if there are any oddities in the data that might indicate problems (e.g. contaminating organisms, which are very common). We've done this using BLASTN and DIAMOND in place of BLASTX and have found this helps considerably (you can also use the results to help identify and filter the problematic sequences). You did say this was a pond plant and your hits are against algae...

ADD COMMENTlink written 4.5 years ago by Chris Fields2.1k

Hey Chris, thanks for this. It's a de novo transcriptome assembly. Essentially what files are needed for this? Will a fasta of the assembly suffice? Thanks

ADD REPLYlink written 4.5 years ago by Biogeek400

Not sure how well this would work with a transcriptome assembly as the coverage varies so much (that is one critical component on a blobplot). But you could probably look at variation in GC content and overall what taxonomic groups are found via BLASTX.

EDIT: 'phylogenetic' -> 'taxonomic'

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Chris Fields2.1k

An area which I agree needs some improvement ;-). Might stick with the embryophyta and then blastN. Would someone be criticised for this approach?

ADD REPLYlink written 4.5 years ago by Biogeek400
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1244 users visited in the last hour