Question

MEME calling motifs from eCLIP-seq file generate no information

0

Entering edit mode

5.2 years ago

Kai_Qi ▴ 130

Hi:

I have used PureClip to analyze the an eClip-seq dataset and get the bedfile. Now I want to get the binding motif of the RBP, so I installed MEME to call motif from the bedfile.

The head of my bed file is like this:

head file.bed

chr1    633991  633992  3   14.891  +   [score_CL=14.891;score_E=55.5259;score_B=56.5133;score_UC=14.891]
chr1    633992  633993  3   13.9207 +   [score_CL=13.9207;score_E=55.7616;score_B=56.7817;score_UC=13.9207]
chr1    633993  633994  3   8.62409 +   [score_CL=8.62409;score_E=55.9413;score_B=44.2988;score_UC=8.62409]
chr1    633997  633998  3   2.33843 +   [score_CL=2.33843;score_E=52.7415;score_B=28.4546;score_UC=2.33843]
chr1    633998  633999  3   5.65288 +   [score_CL=5.65288;score_E=50.1933;score_B=32.0494;score_UC=5.65288]
chr1    633999  634000  3   5.54813 +   [score_CL=5.54813;score_E=48.5419;score_B=31.9227;score_UC=5.54813]
chr1    634009  634010  3   0.880815    +   [score_CL=0.880815;score_E=44.1986;score_B=22.3018;score_UC=0.880815]
chr1    634010  634011  3   0.782717    +   [score_CL=0.782717;score_E=45.8173;score_B=22.1729;score_UC=0.782717]
chr1    1037335 1037336 3   0.189687    +   [score_CL=0.189687;score_E=57.7691;score_B=25.1448;score_UC=0.189687]
chr1    1037336 1037337 3   0.277321    +   [score_CL=0.277321;score_E=55.6412;score_B=25.2031;score_UC=0.277321]

So I get fasta file from the bed file using bedtools:

bedtools getfasta -fi GRCh38.primary_assembly.genome.fa -bed file.bed -s -fo file.fasta

and run MEME like this:

meme file.fasta -mod anr -o RBP_MEME_anr_motif6 -rna -nmotifs 6 >log015 2>error015

But I get nothing except the following feedback:

Writing results to output directory 'PureCLIP_HNRNPK_MEME_anr_motif6'.
BACKGROUND: using background model of order 0 PRIMARY (classic): n
13474 p0 13474 p1 0 p2 0 SEQUENCE GROUP USAGE-- Starts/EM: p0; Trim:
p0; pvalue: p0; nsites: p0,p1,p2 All sequences must be at least 8
characters long.  Set '-w' or '-minw' or remove shorter sequences and
rerun.

From the bedfile I see the sequence is about 3bp long, how should I fill the condition of at least 8 characters long?

Can anyone help me on this? Thank you very much.

software-error RNA-Seq next-gen ChIP-Seq • 2.9k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 5.2 years ago by Kai_Qi ▴ 130

i.sudbery · Answer 1 · 2020-04-26

0

Entering edit mode

5.2 years ago

i.sudbery 21k

1) You can get rid of the restriction for having 8bp sequences by setting -minw to a smaller value (this is the minimum motif size searched for).

2) You might look what fraction of your sequences are less than the minimum size (8bp or what ever you set -minw to). If its only a minor fraction, you could just filter those ones out.

3) You might find the summit of the peak (the position with the highest tag depth), and then set a window a fixed size either side of that summit. I've just 15nt either way in iCLIP analysis before, but that was for each tag, not for peaks as a whole.

ADD COMMENT • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

Thank you for the reply. I tried it but still have no results:

 meme PureCLIP.crosslink_sites.cov_inputSignal.fasta -mod anr  -minw 3 -o PureCLIP_HNRNPK_MEME_anr_motif6 -rna -nmotif 6 >log016 2>error016

error is like:

meme: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory

Very much appreciated!

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130

1

Entering edit mode

This means that your version of MEME linked to a system installed MPI library when it was compiled, but that library is no longer available on the system you are running it on.

If this is on your local machine, I recommend just re-installing/re-compiling MEME. If its on a shared system, then you will probably need to talk to the system admin.

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

Again, thank you for your time and patience. I just re-intalled it using :

conda install -c bioconda/label/cf201901 meme

and re-ran it gain:meme PureCLIP.crosslink_sites.cov_inputSignal.fasta -mod anr -minw 3 -o PureCLIP_HNRNPK_MEME_anr_motif6 -rna -nmotif 6 >log016 2>error016

This time the output file is still empty, I checked the error it reads:

error at: -nmotif

I changed the parameter to -nmotif 1, still has the same error.

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130

1

Entering edit mode

the option is -nmotifs not -nmotif

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

en, it solves the problem. But it brought me back to the old problem:

BACKGROUND: using background model of order 0 PRIMARY (classic): n 13474 p0 13474 p1 0 p2 0 SEQUENCE GROUP USAGE-- Starts/EM: p0; Trim: p0; pvalue: p0; nsites: p0,p1,p2 All sequences must be at least 2 characters long. Set '-w' or '-minw' or remove shorter sequences and rerun.

Could it be the problems of my bed file? I see most of the coordinates has nucleotide length of 3. i set it to --minw 2, still echos the same error.

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130

1

Entering edit mode

Its not quite the same error - note that it now says All sequences must be at least 2 characters long. That means there are some peaks that are only 1 characters long. If most of your peaks are short, you are never going to find any motifs.

I recommend extending the peaks as I suggested in 3 in my orignal answer. If you don't have a summit, just use the middle of the peak.

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

Thanks a lot for your teaching. I don't know how to do #3. But I will search and learn. I will get back to this question as soon as I understand what it means. Thanks again!

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Hi:

I read the manual again and go to see the files I used. I guess the main problem might be my input files: After analysis using Pureclip, I got a bed file containing the I showed in the beginning: the first column is the chromosome number, the second column is the position of the and the third column is the position behind the crosslink site (usually it is the 2nd column +1). Then I transformed this bed file into a fasta file: bedtools getfasta -fi file.fa -bed file.bed -s -fo file.fasta & and use this fastfile to do the MEME call motif. The problem is that every sequence in the fasta file is a mononucleotide because the coordinates of the above mentioned bed file:

$ head filtered.fasta 

>chr1:633991-633992(+) 
T
>chr1:633992-633993(+) 
A
>chr1:633993-633994(+) 
T
>chr1:633997-633998(+) 
C
>chr1:633998-633999(+) 
C

. So I guess it is the problem of my input? Do I think in the right way?

ADD REPLY • link updated 5.2 years ago by i.sudbery 21k • written 5.2 years ago by Kai_Qi ▴ 130

1

Entering edit mode

Okay, so Pureclip is outputting signficant crosslinks, rather than peaks.

So I recommend you first turn these into regions.

Easist way to do this would be to extend each one 15nt in each direction (use bedtools slop to do this). Then merge overlapping regions (use bedtools merge).

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

Very much appreciated. I am going to do read the manual of these command and see what I can get.

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Hi:

I want to ask a naive question on the step of analyzing CLIP-seq data. Usually, the beginning is to cut adaptor. My questions is after cutting the adaptor, the sequence for labeling each individual cDNA will also be removed because it is included in the adaptor, how will it be able to remove PCR duplicates later on?

Thank you very much,

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Usually rather than discarding the UMI sequence, it is moved to the read name so that it can be used for deduplcating afterwards. Or at least, thats how I do it for iCLIP data.

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

I see. Thank you!

May I ask what package you used for iClip-seq analysis. I want to learn to use one whose output can be used for motif calling directly.

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130

1

Entering edit mode

I wrote my own. Sorry.

I used UMI-tools (which was actaully written orignally for iCLIP) to process the barcodes onto the reads and de-multiplexed with reaper. I then mapped with STAR and deduplicated with UMI-tools. We then called peaks with my own peak caller. But there are other peak callers out there to call peaks from CLIP data.

ADD REPLY • link 5.2 years ago by i.sudbery 21k

0

Entering edit mode

That is so cool! I will read some reviews~

Thanks,

Cai

ADD REPLY • link 5.2 years ago by Kai_Qi ▴ 130