Question: PlantTFDB transcription factor discovery.
0
gravatar for jvire1
9 weeks ago by
jvire110
jvire110 wrote:

Hi all,

I have assembled three transcriptomes of a non-model plant and have been writing up a report. Initially, I blastx and blastp [E-value 1e-5] queried the unigenes and coded for proteins against the entire collection of PlantTFDB protein sequences.

Upon analyzing the unigene blastx and blastp hits I came to the realization that I was getting way too many members of each of the 58 transcription factor families. For example ~3000 unigenes were annotated to bHLH for one of my assemblies, however according to the PlantTFDB species summary (http://planttfdb.cbi.pku.edu.cn/family.php?fam=bHLH) for this family the highest number of bHLH genes identified in one species was 559 (Panicum virgatum).

I have since then filtered the blastx and blastp results at an E-value of 1e-50 (as 1e-5 in hindsight was way too low) and >35% ID. This reduced the number of bHLH annotated unigenes to ~1000, but I suspect this is still too high of an estimate.

I have also been able to generate percent hit coverage stats for the blast results and was thinking that I could similarly filter the results to include hits above some percent hit coverage threshold.

Any suggestions on an alternative approach or a percent hit coverage threshold to filter with would be much appreciated.

rna-seq • 197 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by jvire110
0
gravatar for jvire1
9 weeks ago by
jvire110
jvire110 wrote:

Having slept on it I realized a solution would be to take the transcripts with blastp and blastx hits and analyze them with the PlantTFDB prediction server, which uses a much more robust method to determine homology and has limits on the size of uploaded sequences.

ADD COMMENTlink written 9 weeks ago by jvire110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1339 users visited in the last hour