Question: MEME calling motifs from eCLIP-seq file generate no information
0
gravatar for Kai_Qi
10 months ago by
Kai_Qi100
Chicago, IL
Kai_Qi100 wrote:

Hi:

I have used PureClip to analyze the an eClip-seq dataset and get the bedfile. Now I want to get the binding motif of the RBP, so I installed MEME to call motif from the bedfile.

The head of my bed file is like this:

head file.bed

chr1 633991 633992 3 14.891 + [score_CL=14.891;score_E=55.5259;score_B=56.5133;score_UC=14.891] chr1 633992 633993 3 13.9207 + [score_CL=13.9207;score_E=55.7616;score_B=56.7817;score_UC=13.9207] chr1 633993 633994 3 8.62409 + [score_CL=8.62409;score_E=55.9413;score_B=44.2988;score_UC=8.62409] chr1 633997 633998 3 2.33843 + [score_CL=2.33843;score_E=52.7415;score_B=28.4546;score_UC=2.33843] chr1 633998 633999 3 5.65288 + [score_CL=5.65288;score_E=50.1933;score_B=32.0494;score_UC=5.65288] chr1 633999 634000 3 5.54813 + [score_CL=5.54813;score_E=48.5419;score_B=31.9227;score_UC=5.54813] chr1 634009 634010 3 0.880815 + [score_CL=0.880815;score_E=44.1986;score_B=22.3018;score_UC=0.880815] chr1 634010 634011 3 0.782717 + [score_CL=0.782717;score_E=45.8173;score_B=22.1729;score_UC=0.782717] chr1 1037335 1037336 3 0.189687 + [score_CL=0.189687;score_E=57.7691;score_B=25.1448;score_UC=0.189687] chr1 1037336 1037337 3 0.277321 + [score_CL=0.277321;score_E=55.6412;score_B=25.2031;score_UC=0.277321]

So I get fasta file from the bed file using bedtools:

bedtools getfasta -fi GRCh38.primary_assembly.genome.fa -bed file.bed -s -fo file.fasta

and run MEME like this:

meme file.fasta -mod anr -o RBP_MEME_anr_motif6 -rna -nmotifs 6 >log015 2>error015

But I get nothing except the following feedback:

Writing results to output directory 'PureCLIP_HNRNPK_MEME_anr_motif6'. BACKGROUND: using background model of order 0 PRIMARY (classic): n 13474 p0 13474 p1 0 p2 0 SEQUENCE GROUP USAGE-- Starts/EM: p0; Trim: p0; pvalue: p0; nsites: p0,p1,p2 All sequences must be at least 8 characters long. Set '-w' or '-minw' or remove shorter sequences and rerun.

From the bedfile I see the sequence is about 3bp long, how should I fill the condition of at least 8 characters long?

Can anyone help me on this? Thank you very much.

ADD COMMENTlink modified 10 months ago by ATpoint46k • written 10 months ago by Kai_Qi100
0
gravatar for i.sudbery
10 months ago by
i.sudbery11k
Sheffield, UK
i.sudbery11k wrote:

1) You can get rid of the restriction for having 8bp sequences by setting -minw to a smaller value (this is the minimum motif size searched for).

2) You might look what fraction of your sequences are less than the minimum size (8bp or what ever you set -minw to). If its only a minor fraction, you could just filter those ones out.

3) You might find the summit of the peak (the position with the highest tag depth), and then set a window a fixed size either side of that summit. I've just 15nt either way in iCLIP analysis before, but that was for each tag, not for peaks as a whole.

ADD COMMENTlink written 10 months ago by i.sudbery11k

Thank you for the reply. I tried it but still have no results:

 meme PureCLIP.crosslink_sites.cov_inputSignal.fasta -mod anr  -minw 3 -o PureCLIP_HNRNPK_MEME_anr_motif6 -rna -nmotif 6 >log016 2>error016

error is like:

meme: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory

Very much appreciated!

ADD REPLYlink written 10 months ago by Kai_Qi100
1

This means that your version of MEME linked to a system installed MPI library when it was compiled, but that library is no longer available on the system you are running it on.

If this is on your local machine, I recommend just re-installing/re-compiling MEME. If its on a shared system, then you will probably need to talk to the system admin.

ADD REPLYlink written 10 months ago by i.sudbery11k

Again, thank you for your time and patience. I just re-intalled it using :

conda install -c bioconda/label/cf201901 meme

and re-ran it gain:meme PureCLIP.crosslink_sites.cov_inputSignal.fasta -mod anr -minw 3 -o PureCLIP_HNRNPK_MEME_anr_motif6 -rna -nmotif 6 >log016 2>error016

This time the output file is still empty, I checked the error it reads:

error at: -nmotif

I changed the parameter to -nmotif 1, still has the same error.

ADD REPLYlink written 10 months ago by Kai_Qi100
1

the option is -nmotifs not -nmotif

ADD REPLYlink written 10 months ago by i.sudbery11k

en, it solves the problem. But it brought me back to the old problem:

BACKGROUND: using background model of order 0 PRIMARY (classic): n 13474 p0 13474 p1 0 p2 0 SEQUENCE GROUP USAGE-- Starts/EM: p0; Trim: p0; pvalue: p0; nsites: p0,p1,p2 All sequences must be at least 2 characters long. Set '-w' or '-minw' or remove shorter sequences and rerun.

Could it be the problems of my bed file? I see most of the coordinates has nucleotide length of 3. i set it to --minw 2, still echos the same error.

ADD REPLYlink written 10 months ago by Kai_Qi100
1

Its not quite the same error - note that it now says All sequences must be at least 2 characters long. That means there are some peaks that are only 1 characters long. If most of your peaks are short, you are never going to find any motifs.

I recommend extending the peaks as I suggested in 3 in my orignal answer. If you don't have a summit, just use the middle of the peak.

ADD REPLYlink written 10 months ago by i.sudbery11k

Thanks a lot for your teaching. I don't know how to do #3. But I will search and learn. I will get back to this question as soon as I understand what it means. Thanks again!

ADD REPLYlink written 10 months ago by Kai_Qi100

Hi:

I read the manual again and go to see the files I used. I guess the main problem might be my input files: After analysis using Pureclip, I got a bed file containing the I showed in the beginning: the first column is the chromosome number, the second column is the position of the and the third column is the position behind the crosslink site (usually it is the 2nd column +1). Then I transformed this bed file into a fasta file: bedtools getfasta -fi file.fa -bed file.bed -s -fo file.fasta & and use this fastfile to do the MEME call motif. The problem is that every sequence in the fasta file is a mononucleotide because the coordinates of the above mentioned bed file:

$ head filtered.fasta 

>chr1:633991-633992(+) 
T
>chr1:633992-633993(+) 
A
>chr1:633993-633994(+) 
T
>chr1:633997-633998(+) 
C
>chr1:633998-633999(+) 
C

. So I guess it is the problem of my input? Do I think in the right way?

ADD REPLYlink modified 10 months ago by i.sudbery11k • written 10 months ago by Kai_Qi100
1

Okay, so Pureclip is outputting signficant crosslinks, rather than peaks.

So I recommend you first turn these into regions.

Easist way to do this would be to extend each one 15nt in each direction (use bedtools slop to do this). Then merge overlapping regions (use bedtools merge).

ADD REPLYlink written 10 months ago by i.sudbery11k

Very much appreciated. I am going to do read the manual of these command and see what I can get.

ADD REPLYlink written 10 months ago by Kai_Qi100

Hi:

I want to ask a naive question on the step of analyzing CLIP-seq data. Usually, the beginning is to cut adaptor. My questions is after cutting the adaptor, the sequence for labeling each individual cDNA will also be removed because it is included in the adaptor, how will it be able to remove PCR duplicates later on?

Thank you very much,

ADD REPLYlink written 10 months ago by Kai_Qi100

Usually rather than discarding the UMI sequence, it is moved to the read name so that it can be used for deduplcating afterwards. Or at least, thats how I do it for iCLIP data.

ADD REPLYlink written 10 months ago by i.sudbery11k

I see. Thank you!

May I ask what package you used for iClip-seq analysis. I want to learn to use one whose output can be used for motif calling directly.

ADD REPLYlink written 10 months ago by Kai_Qi100
1

I wrote my own. Sorry.

I used UMI-tools (which was actaully written orignally for iCLIP) to process the barcodes onto the reads and de-multiplexed with reaper. I then mapped with STAR and deduplicated with UMI-tools. We then called peaks with my own peak caller. But there are other peak callers out there to call peaks from CLIP data.

ADD REPLYlink written 10 months ago by i.sudbery11k

That is so cool! I will read some reviews~

Thanks,

Cai

ADD REPLYlink written 10 months ago by Kai_Qi100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1060 users visited in the last hour
_