I am a newbie with ChIP-seq data and I ran into some problems.
To make it more clear, I'll list the data and tools that I used:
- Original ChIP-seq data on histone marks download from GEO(SRA format, read length 101, H3K4me3 which is sharp signal);
- Reference human genome and GTF file download from GENCODE(release version 22, which is currently used by UCSC as GRCh38);
Firstly, I convert the SRA files into fastq files. Then I use STAR to align the reads to the reference genome with splicing option turned off(-alignIntronMax 1). Then I convert the sorted BAM file to BED to be used as the input of MACS2. I basically use default option with P-value=1e-5 and generate bedGraph file. Then I tried to convert the bedGraph file to bigwig so that I can visualize in IGV.
However, when I use the tools from UCSC bedGraphToBigWig, which required a file indicating all the chromosome sizes(Using the code recommended by TaoLiu https://github.com/taoliu/MACS/wiki/Build-Signal-Track ), I came into problem that some chromosome names in my bed file can't be found(like "KT270757.1" and "GL000216.2"). The process of converting thus ended.
Do you guys know how to generate reads pileup track so that I can compared with the peaks called by MACS2?
Stuck in the bedGraphToBigWig step, I tried MACS14 with -w -S(to output wiggle format that can be converted to tdf format by IGVtools). Then I am able to visualized the peak calling results with the reads pile-up track together.
Also, I am not sure what fragment extension size should I used to get better peaks calling results. In my test using MACS2, I was using default and MACS2 assign 161 for me, while MACS14 assign 137 for me. I did received warning message using MACS2 say something like "the tag size(?) is smaller than 2length, you might want to assign another value...". Is it a must that the extended size bigger the 2readlength?
Lastly, how to evaluate and interpret the results of MACS? How to say that it is a good peak-calling results?
Thanks a lot for your time and help!