Question: Normalising ChIP-seq data to RPM using a 1 or 2 count for paired-end data?
1
gravatar for James Ashmore
4.1 years ago by
James Ashmore2.6k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.6k wrote:

I want to normalise my paired-end ChIP-seq data (4 samples treated, 4 samples control) to reads per million (RPM). So far my calculation takes the number of reads mapped to a position, divides it by the total number of mapped reads in the sample, and then multiples this by one million. My question is, should the number of mapped paired-end reads be counted as 1 or 2 in the calculation? I believe Samtools reports the total number of mapped reads as half the actual number of paired-end reads which map?

My current pseudo script for normalising to RPM:

  • mappedReads = samtools view -c -F 4 input.bam
  • scalingFactor = 1000000 / mappedReads
  • bedtools genomecov -ibam input.bam -bg - scale scalingFactor -g chrom.sizes > output.bedGraph
  • bedGraphToBigWig input.bedGraph chrom.sizes output.bigWig
ADD COMMENTlink modified 4 weeks ago by Biostar ♦♦ 20 • written 4.1 years ago by James Ashmore2.6k

For paired end data, rpm=fpm (fragments per million). So, it should be counted as one. 

ADD REPLYlink written 4.1 years ago by poisonAlien2.7k

Just to clarify, you are saying that each pair of reads should be counted as just one? So in a perfect world where 100% of my reads align, the number of total alignments would be half the total number of reads in the data?

ADD REPLYlink written 4.1 years ago by James Ashmore2.6k

Yup. Count a pair of reads as one. For single end reads each read represents a fragment it is sequenced from whereas for paired-end, a pair of sequence represents the fragment it is sequenced from. As simple as that.

ADD REPLYlink written 4.1 years ago by poisonAlien2.7k
7
gravatar for Fidel
4.1 years ago by
Fidel1.9k
Germany
Fidel1.9k wrote:

You can use deepTools to do the normalisation that you want. The output is a bigwig file. It is quite fast if you have multiple cores.

bamCoverage -b file.bam --normalizeUsingRPKM -o file.bw 

The program first extends the read to match the paired-end length before computing the coverage. I opted to count paired end reads as 2 and not as 1 to avoid a bias when a read is not properly paired which could be a significant fraction.

 

ADD COMMENTlink written 4.1 years ago by Fidel1.9k

Thank you for the suggestion, I used Bowtie2 to map my reads but indicated to remove discordant and mixed alignments, in this case counting each pair of reads as 1 would be preferred

ADD REPLYlink written 4.1 years ago by James Ashmore2.6k

One option is to filter your bam file to include only the first mate, then use it with bamCompare.  If all your reads are properly paired this should work.

ADD REPLYlink written 4.1 years ago by Fidel1.9k

I just pushed a commit to github enabling bamCoverage to select reads based on the SAM flag. Now you can do something like this to get what you want:

bamCoverage -b file.bam --normalizeUsingRPKM -o file.bw --samFlag 64

--samFlag 64 means to select only the first mate. 

 

ADD REPLYlink written 4.1 years ago by Fidel1.9k
0
gravatar for Manvendra Singh
4.1 years ago by
Manvendra Singh2.0k
Berlin, Germany
Manvendra Singh2.0k wrote:

I think each read should be counted as 1

or

you can try MAnorm for normalization if it suits for your analysis or comparisons

ADD COMMENTlink written 4.1 years ago by Manvendra Singh2.0k

Initially I just want to create a coverage track for the normalised data.

ADD REPLYlink written 4.1 years ago by James Ashmore2.6k

Okay, I agree with Fidel

ADD REPLYlink written 4.1 years ago by Manvendra Singh2.0k
0
gravatar for Ian
4.1 years ago by
Ian5.3k
University of Manchester, UK
Ian5.3k wrote:

In case you do not already know MACS2 will generate coverage plots for the non-redundant fragments used in the analysis (by default 1 unique fragment per strand) normalised to millions of reads using the '--SPMR' flag.

ADD COMMENTlink written 4.1 years ago by Ian5.3k

I think MACS does not use the paired-end data to extend the reads. Does it extend the reads?

ADD REPLYlink written 4.1 years ago by Fidel1.9k

MACS2 does not need to extend reads as it knows the start and end of the fragment by using the paired end reads information in the BAM file.

ADD REPLYlink written 4.1 years ago by Ian5.3k

For this you need to use the BAMPE mode of MACS2 which I am not sure it properly works. We tried it and didn't work, also I see numerous complains in the mailing list. Moreover, the bam file has to be sorted by name. I always use MACS2 for peak calling but not coverage tracks.

ADD REPLYlink written 4.1 years ago by Fidel1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour