Manorm For Differential Histone Binding In Chip-Seq Samples
1
0
Entering edit mode
9.6 years ago
Diana ▴ 900

Hi everyone,

I'm using the program MAnorm to find differential histone binding sites between 2 of my chip-seq samples. If anyone has used it before can they sort my problem? MAnorm can be obtained from here:

http://bcb.dfci.harvard.edu/~gcyuan/MAnorm/MAnorm.htm

It takes as input 4 files:

sample1_peaks, sample2_peaks, sample1_alignment, sample2_alignment where the files look like this:

Peak files:

Chromosome      Start              End
1                      194749677      194750748
1                      13570102      13576846


and alignment files:

1 35 99 HWI-ST1014:143:D1PYRACXX:1:2107:10201:100917/2   25 +
1 39 91 HWI-ST1014:143:D1PYRACXX:1:1308:15197:167335/2  25 +


Now when I run the program on my files, it gives me the following error while reading the alignment files: ??? Error using ==> textscan Invalid file identifier. Use fopen to generate a valid file identifier.

Error in ==> step3_split_sequencing_data_2chr at 61 Error in ==> MAnorm_run_through at 63

Ive checked these lines multiple times and I don't see anything wrong with these. My files are in the right folder which is on the path but still it gives me this error. I have another pair of samples for which this program runs perfectly and gives me no error. I just don't understand why it doesn't work with these samples and why it's not reading the alignment files. What could be going wrong?

Thanks!

Edit: commands to run the program:

% Note1: Change the parameter values inside the parameter region to run MAnorm on your samples;
% Note2: All the input files should be of tab-delimited text format.
% Note3: To know the usage of a MATLAB function, type "help function_name", e.g. help addpath, in MATLAB command line.
clear;

%%%%%% start of parameter region %%%%%%
nchr = 30;              % number of chromosomes; 24 for human, 21 for mouse

% the bed file name of peak coordinate of sample 1
peakfile_1 = 'C:\MATLAB\MAnorm_MATLAB_Package\MAnorm_MATLAB_Package\Input\ss89_k27ac_peaks-Copy-test.bed';
% the bed file name of read coordinate of sample 1: Yale ENCODE MYC ChIP-seq data replicate 1 in HelaS3 cells
rawdatafile_1 = 'C:\MATLAB\MAnorm_MATLAB_Package\MAnorm_MATLAB_Package\Input\ss89_k27ac_final.bed';
% shift size of reads for sample 1; which equal to the half of the DNA fragment after size selection
tag_shift_1 = 123;
% length of reads for sample 1
tag_length_1 = 69;

% the bed file name of peak coordinate of sample 2
peakfile_2 = 'C:\MATLAB\MAnorm_MATLAB_Package\MAnorm_MATLAB_Package\Input\ss12_k27ac_peaks-Copy-test.bed';
% the bed file name of read coordinate of sample 2: Yale ENCODE MYC ChIP-seq data replicate 1 in K562 cells
rawdatafile_2 = 'C:\MATLAB\MAnorm_MATLAB_Package\MAnorm_MATLAB_Package\Input\ss12_k27ac_final.bed';
% shift size of reads for sample 1; which equal to the half of the DNA fragment after size selection
tag_shift_2 = 104;
% length of reads for sample 1
tag_length_2 = 50;

% directory of work folder to store temperary files; should be made before analysis
workdir = 'C:\MATLAB\MAnorm_MATLAB_Package\MAnorm_MATLAB_Package\';
% directory of output folder to store output files; should be made before analysis
outputdir = 'C:\MATLAB\MAnorm_MATLAB_Package\MAnorm_MATLAB_Package\output_89_12\';
% name of comparison
MApair = 'ss89_ss12';

extention = 10000;            % 2*extention = the window size to calculate read density; we recommend extention=1000 for histone modification, extention=500 for transcription factor
summit2summit_dist = 5000;      % the selected common peaks much have summit-to-summit distance smaller than this value to as matched common peaks; suggest to be half of extention

Mcut_unbiased = 1;          % Curoff of |M| value to define unbiased peaks between 2 samples; here M value is the log2 fold change at peak region between two samples, value 1 means fold change 2
% Non-cell type-specific binding regions: all peaks associated with |M|<Mcut_unbiaseda.

Mcut_biased = 1;            % Cutoff of M value cutoff to define biased peaks between 2 samples:
Pcut_biased = 1e-2;         % Cutoff of P value (of differential binding) cutoff to define biased peaks between 2 samples;
% Sample 1-specific binding regions: all peaks associated with M>=Mcut_biased and P<Pcut_biased;
% Sample 2-specific binding regions: all peaks associated with M<=-Mcut_biased and P<Pcut_biased;
%%%%%% end of parameter region %%%%%%

disp(' ');

% classify common or unique peaks
disp(' ');
disp('Step2: classify common or unique peaks');
Nsamp = 100;                 % number of random permutation to calculate fold enrichment of overlap
step2_classify_2peak_sets_by_overlap(peak_name_1, peak_name_2, workdir, Nsamp);

% split raw data by chromosome
disp(' ');
disp('Step3: read and split raw data by chromosome');
[pathstr1, marker_name_1, ext1, versn1] = fileparts(rawdatafile_1);
step3_split_sequencing_data_2chr(rawdatafile_1, nchr, workdir);

[pathstr2, marker_name_2, ext2, versn2] = fileparts(rawdatafile_2);
step3_split_sequencing_data_2chr(rawdatafile_2, nchr, workdir);

% calculate read density in peaks
disp(' ');
disp('Step4: calculate read density in peaks');
step4_calculate_peak_read_density(peak_name_1, peak_name_2, marker_name_1, marker_name_2, extention, workdir, nchr, tag_shift_1, tag_shift_2, tag_length_1, tag_length_2);

% calculate normalized M-A values
disp(' ');
disp('Final Step: normalize and output the M-A values');
step5_rescale_peak_strength_by_common_peaks(peak_name_1, peak_name_2, marker_name_1, marker_name_2, summit2summit_dist, workdir, MApair, outputdir);

% output the normalized M-value to wig file (in order to upload to genome browsers)
disp(' ');
disp('Summarize data: output the M value of 2 peak sets to wig files');
step6_output_peak_Mvalue_2wig_file(peak_name_1, workdir, MApair, outputdir);
step6_output_peak_Mvalue_2wig_file(peak_name_2, workdir, MApair, outputdir);

% merge common peaks and define unbiased and biased peaks between two samples based on given M-value and P-value cutoff
disp(' ');
disp('Summarize data: merge common peaks and extract differential and non-differential binding regions');
step7_define_biased_unbiased_peaks(peak_name_1, peak_name_2, workdir, MApair, outputdir, Mcut_unbiased, Mcut_biased, Pcut_biased)

% clean up
rmpath('C:\MATLAB\MAnorm_MATLAB_Package\MAnorm_MATLAB_Package\CodeLib');
clear all;


These are the commands that came with the package to run the program. I just changed the file names. The rest is as it is in the package.

matlab chipseq chip-seq • 5.2k views
0
Entering edit mode

This might be a better question to email the authors since it sounds like either a bug or a command input error. What the error message is saying is that it cannot open the file that is being asked for, would you mind pasting the command you used to run?

0
Entering edit mode

Thanks for your reply, I've pasted the commands used to run the program. This is the file that came with the program.

0
Entering edit mode
9.6 years ago
vj ▴ 520

The problem may be with the alignment files where it may be looking for the strand information in the fourth column not the tag id.

You can try the commands below to filter out the necessary fields.

awk ' OFS="\t"  { print $1,$2,$3,$4 } ' bed1 > peakfile1.bed
awk ' OFS="\t"  { print $1,$2,$3,$4 } ' bed2 > peakfile2.bed
awk ' OFS="\t"  { print $1,$2,$3,$6 } ' read1 > readfile1.bed
awk ' OFS="\t"  { print $1,$2,$3,$6 } ' read2 > readfile2.bed

0
Entering edit mode

This is how its reading the file:

fs = fopen(rawdatafile, 'r');
while end_of_file
clear CC;
CC = textscan(fs, '%s%s%s%s%s%s%*[^\n]', linenum);

0
Entering edit mode

Sorry, I thought you were using the R version of MAnorm.