Question: Homer error (findMotifs.pl) : are you sure you chose the right length for motif finding?
0
gravatar for salamandra
20 months ago by
salamandra280
salamandra280 wrote:

I did a ChIP-seq analysis with a genome GRCh38.92 from ensembl and also used the corresponding gene anotation file (.gtf) from ensembl. Genome, annotation and promoter sets directly provided by homer (for hg38 version) are retrieved from UCSC, so I should not work with those.

With peak annotation and motif finding using genomic regions I used the .fa and gtf files directly and worked fine:

annotatePeaks.pl $Peak $GENOME -gtf $GTF > $OUT
findMotifsGenome.pl $PEAK $GENOME $OUT -p 4 -size 400

To run findMotifs.pl first had to load the custom genome:

loadGenome.pl -name GRCh38.92 -org human -fasta $GENOME -gtf $GTF

and then generate a promoter set:

loadPromoters.pl -name  GRCh38.92.promoter  -org human -id ensembl -genome GRCh38.92   -tss /Applications/homer/data/genomes/GRCh38.92/GRCh38.92.tss

When running findMotifs.pl:

findMotifs.pl $GENE GRCh38.92.promoter $OUT -p 4

I got this error:

!!! Something is wrong... are you sure you chose the right length for motif finding?
!!! i.e. also check your sequence file!!!
Use of uninitialized value in numeric gt (>) at /Applications/homer//bin/compareMotifs.pl line 1389.
    !!! Filtered out all motifs!!!
    Job finished

I changed the chromossome names of .gtf and .fa files from 'Number' to 'chrNumber' and still got the same error.

Could you please tell me what am I doing wrong?

The input files are here: ftp://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ftp://ftp.ensembl.org/pub/release-92/gtf/homo_sapiens/Homo_sapiens.GRCh38.92.gtf.gz

Content of $GENE follows:

ENSG00000005486
ENSG00000010818
ENSG00000033122
ENSG00000115902
ENSG00000081277
ENSG00000075213
ENSG00000141429
ENSG00000149256
ENSG00000157077
ENSG00000163681
ENSG00000105662
ENSG00000107731
ENSG00000014257
ENSG00000151388
ENSG00000197386
ENSG00000197442
ENSG00000138119
ENSG00000255282
ENSG00000186094
ENSG00000148219
ENSG00000124802
ENSG00000070476
ENSG00000138311
ENSG00000159261
ENSG00000214900
ENSG00000177483
ENSG00000237128
ENSG00000112378
ENSG00000115648
ENSG00000234323
ENSG00000128512
ENSG00000196950
ENSG00000143294
ENSG00000163449
ENSG00000006652
ENSG00000115415
ENSG00000082781
ENSG00000105993
ENSG00000116285
ENSG00000188257
ENSG00000182185
ENSG00000121879
ENSG00000240925
ENSG00000078269
ENSG00000065809
ENSG00000057935
ENSG00000175110
ENSG00000136147
ENSG00000169567
ENSG00000251432
ENSG00000138759
ENSG00000138639
ENSG00000249738
ENSG00000116698
ENSG00000110801
ENSG00000066084
ENSG00000257322
ENSG00000258386
ENSG00000258902
ENSG00000187446
ENSG00000197943
ENSG00000259986
ENSG00000182149
ENSG00000008283
ENSG00000153944
ENSG00000132141
ENSG00000267551
ENSG00000168675
ENSG00000146963
ENSG00000280551
ENSG00000229140
ENSG00000231426
motifs chip-seq homer • 1.3k views
ADD COMMENTlink modified 20 months ago by Kevin Blighe56k • written 20 months ago by salamandra280
2
gravatar for Kevin Blighe
20 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

I have just successfully completed the series of commands using just 500,000 chr22 bases from the same data that you're using.

grep -e ">22" -A 500000 Homo_sapiens.GRCh38.dna.primary_assembly.fa > chr22.fa

Homer/bin/loadGenome.pl -name GRCh38.92 -org human -fasta chr22.fa -gtf Homo_sapiens.GRCh38.92.gtf -gid

Homer/bin/loadPromoters.pl -name  GRCh38.92.promoter -org human -id custom -genome GRCh38.92 -tss Homer/data/genomes/GRCh38.92/GRCh38.92.tss


cat genelookup.list
ENSG00000276871
ENSG00000283023
ENSG00000276138
ENSG00000280341

head Homer/data/promoters/GRCh38.92.promoter.base
ENSG00000277248
ENSG00000283047
ENSG00000279973
ENSG00000226444
ENSG00000276871
ENSG00000283023
ENSG00000276138
ENSG00000280341
ENSG00000236235
ENSG00000279442

Homer/bin/findMotifs.pl genelookup.list GRCh38.92.promoter . -p 4

The genes in your gene list ($GENE) have to exactly match those in your GRCh38.92.promoters.base file. You appear to have a mixture of ENSG (Ensembl Gene) and ENST (Ensembl Transcript).


When running this:

Homer/bin/loadGenome.pl -name GRCh38.92 -org human -fasta $GENOME -gtf $GTF -gid

...you can select -gid or -tid to instruct HOMER to pull ENSG (gene_id) or ENST (transcript_id) from the input GTF


When running:

Homer/bin/loadPromoters.pl -name GRCh38.92.promoter -org human -id custom -genome GRCh38.92 -tss Homer/data/genomes/GRCh38.92/GRCh38.92.tss

...be sure to select -id custom because you have a custom genome.

Kevin

ADD COMMENTlink modified 20 months ago • written 20 months ago by Kevin Blighe56k
1

It was missing -id custom and -gid. It's running now! Thank you!

ADD REPLYlink written 20 months ago by salamandra280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour