why there is small number unique hit in the blastx results of transcriptome assembly?
0
0
Entering edit mode
8.8 years ago
seta ★ 1.9k

Dear all,

I'm working on a RNA-seq analysis of a non-model diploid with much heterozygosity.

This transcriptome assembly was done by CLC genomic software with (k=64) after read trimming, and exposed to blastx against uniprot database (viridiplantae), which just 30% of hits were the unique. In addition, I did another assembly with Trinity and mixed it with the CLC assembly, then subjected to cd-hit-est to remove redundancy (threshold 1), it generated 182968 clusters from 204397 input sequences, the blastx was done on this assembly against just arabidopsis proteome as database (for fast evaluation) and Although 80% of contigs got hit, only 20% of hits were unique. I also assess the average collapse factor for this assembly, which was 12.66 that isn't too high.These results makes me crazy as I don't know they are usual or not, what strategy is right? what's wrong and how to solve or even improve it? Please share me your opinion about the issue.

Thank you very much for your participation.

alignment Assembly blast RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

I am not familiar with this but if I were you, I will first try to extract the non-uniquely mapped contigs and see where they align too, then try and align the reference against each other e.g. arabidopsis proteome against arabidopsis proteome. If you still yield non-unique alignment, then it might be because the complexity of the reference is low? At least this should allow you to remove one possible suspect of such problem;

ADD REPLY

Login before adding your answer.

Traffic: 2870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6