ChIP-seq peak calling & visualized in IGV
4
0
Entering edit mode
6.1 years ago

Hi all!

I am a newbie with ChIP-seq data and I ran into some problems.

To make it more clear, I'll list the data and tools that I used:

1. Original ChIP-seq data on histone marks download from GEO(SRA format, read length 101, H3K4me3 which is sharp signal);
2. Reference human genome and GTF file download from GENCODE(release version 22, which is currently used by UCSC as GRCh38);
3. STAR(2.4.2);
4. MACS2/MACS14;
5. IGV;

Firstly, I convert the SRA files into fastq files. Then I use STAR to align the reads to the reference genome with splicing option turned off(-alignIntronMax 1). Then I convert the sorted BAM file to BED to be used as the input of MACS2. I basically use default option with P-value=1e-5 and generate bedGraph file. Then I tried to convert the bedGraph file to bigwig so that I can visualize in IGV.

However, when I use the tools from UCSC bedGraphToBigWig, which required a file indicating all the chromosome sizes(Using the code recommended by TaoLiu https://github.com/taoliu/MACS/wiki/Build-Signal-Track ), I came into problem that some chromosome names in my bed file can't be found(like "KT270757.1" and "GL000216.2"). The process of converting thus ended.

Do you guys know how to generate reads pileup track so that I can compared with the peaks called by MACS2?

Stuck in the bedGraphToBigWig step, I tried MACS14 with -w -S(to output wiggle format that can be converted to tdf format by IGVtools). Then I am able to visualized the peak calling results with the reads pile-up track together.

Also, I am not sure what fragment extension size should I used to get better peaks calling results. In my test using MACS2, I was using default and MACS2 assign 161 for me, while MACS14 assign 137 for me. I did received warning message using MACS2 say something like "the tag size(?) is smaller than 2length, you might want to assign another value...". Is it a must that the extended size bigger the 2readlength?

Lastly, how to evaluate and interpret the results of MACS? How to say that it is a good peak-calling results?

Thanks a lot for your time and help!

Cheer!

ChIP-Seq alignment genome • 9.5k views
2
Entering edit mode
6.1 years ago

When creating the bigWig file, make sure to use the reference genome that you used with STAR, it will contain all of the chromosomes.

Regarding the extension size, if you have single-end reads then ask whoever did the sequencing/library prep. what the fragment size distribution looks like.

For how to interpret results, generally you should have a look at a few peaks and see if they appear reasonable. I would also recommend running plotFingerprint from deepTools on the BAM files, since that'll help you do some basic QC on the data.

1
Entering edit mode
6.1 years ago
ivivek_ngs ★ 5.1k

There are few thing I would like to point out. Yes the fragment size can be estimated with the pantompeakqualtools. Then you can also you deeptools for QC reports. As for the tracks I would suggest to make normalized tracks using the MACS2.1 -SPMR parameter. This will give you the normalized pileups. Once you convert them to bigwig you can charge them with UCSC browser. Or alternatively you can use HOMER to do that, they have a pretty straight forward approach to do that. And once you have the bedgrapgh or bigwig which are normalized you can directly charge them up to the browser.

I would not suggest to use MACS1.4 anymore. I have made some comparisions between MACS1.4, MACS2 and MACS2.1 and found out quite a few discrepancies. There is a major upgrade from versioning of 1.4 to 2. It is good to use the 2.1 version although you can do the compaison between all the 3 tools and try to find a comparison of the top peaks and how the enrichment plots look and also the histogram to see the peak length distribution. To me MACS2.1 is much upgraded (although the paper is not out yet) but people are using it and citing it in different publications.

In order to have the better comparison I would recommend to first find the fragment size (since these are histone marks) provide them in your command line if using MACS2.1 and then use --nomodel parameter along with -p or -q value cutoff and then try to select the top peaks and see how these peaks are overlapping from the different versions of MACS and what is distribution of the enrichment and the peak length. Finally you can also do some downstream analysis to find which peaks gives better biological meaning to your hypothesis.

There are different other peak callers like SICER, Pepr, or even HOMER that are more often used for histone marks. Since you are new to this better to do some exploration with different tools and make some comparisons to get a better hung of the tools. Since this is a public data you can always compare you peaks with that of the publication data taking it as gold standard and see the number of fasle-positives that your calls are giving and the one which overlap you can try to see the distribution of the enrichment scores and the peak length. Good Luck!!

0
Entering edit mode
6.1 years ago

If you want to assign the fragment extension size yourself, you could also use phantompeakqualtools which will estimate the predominant fragment length that you just have to multiply by 2. this tools also give you some metrics about the quality of your data (NSC and RSC)

0
Entering edit mode
5.7 years ago
ariel.balter ▴ 170

I use samtools mpileup to make pileups. But these appear to be different from what macs2 pileup does. Struggling to understand that myself. I'm in the process of writing code to create wigs from my pileup files. You can see my recent question here.

EDIT

Just found out from another post that macs2 makes wiggle files, either split by chromosome (-w/--wig) or for entire genome (add --single-profile).