Questions on Gimme Motif statistics and de novo regulatory search methodology
Entering edit mode
6 weeks ago
kve • 0

My lab has some RNAseq data from cyanobacteria and they have asked me to look for motifs within the promoters of de novo gene clusters to identify potential regulatory sequences that we could then use DNA-protein affinity chromatography to identify potential regulatory proteins.

I was able to easily create clusters, extract promoter regions, and run gimme motifs, but now I am at an impasse. There are hundreds of identified motifs for each of my cluster depths. Making the problem more difficult is that I have struggled find sufficient documentation online for gimme motifs' output statistics.

From my basic research it seems that this approach of wholesale computational de novo motif scanning is generally frowned upon but this approach was suggested by a collaborator who found a motif in a manually curated cluster.

\ My questions to you all:

Does anyone know what the stat values mean or how I should threshold them for accurate motifs? (statistical values below)

Is this methodology misguided / is there a better way to do this?

\ Values:

Motif, best_match, best_match_pvalue, enr_at_fpr, fraction_fpr, ks_pvalue, ks_significance, max_enrichment, max_fmeasure, mncp, num_cluster, phyper_at_fpr, pr_auc, recall_at_fdr, roc_auc, roc_auc_xlim, score_at_fpr, stars

motif rna-seq clustering discovery • 81 views

Login before adding your answer.

Traffic: 1476 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6