Question: FRIP score ATAC-seq
gravatar for Lucy
8 weeks ago by
Lucy0 wrote:


I am trying to calculate the fraction of reads in peaks (FRIP score) for my ATAC-seq samples. What I have so far is written below, however I think it is wrong somehow as I am getting a very low score (when I calculate the FRIP using DiffBind, it comes out much better/more like the expected value e.g. 0.37).

val1=$(bedtools intersect -a <sample_BAM> -b <BED_peakfile> -wa -u | wc -l)
val2=$(samtools view <sample_BAM> | wc -l)
FRIP=$(awk "BEGIN {print "${val1}"/"${val2}"}")

For the peak file, I am using my optimal peak list for the particular cell type (generated using IDR).

Can anyone see what I am doing wrong?



atac-seq frip • 342 views
ADD COMMENTlink modified 7 weeks ago by chaton0 • written 8 weeks ago by Lucy0

Hi Lucy,

for the future, please use the formatting bar to indicate code. This improves readability. I did it for you this time.


ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by ATpoint9.1k
gravatar for ATpoint
8 weeks ago by
ATpoint9.1k wrote:

The problem with your solution is that bedtools intersect uses only the read length, probably 50 or 75bp, to do the intersection, rather than the actual fragment length, which for ATAC-seq should be between 80 and a few hundred bp. The same goes for all bedtools commands as far as I know.

I would use featureCounts from the subread package. It is actually made to count reads per reference regions in order to make count matrices but it also outputs the percentage of reads/fragments assigned to it, which is what you want:

## Make a custom "SAF" file which featureCounts needs:
awk 'OFS="\t" {print $1"-"$2+1"-"$3, $2+1, $3, "+"}' foo.narrowPeak > foo.saf

## run featureCounts (add -T for multithreading)
featureCounts -p -a peaks.saf -F SAF -o out.txt atacseq.bam

This will produce an output to screen with the % of assigned reads (=FRiP)and also a file out.txt.summary, from which you can programatically calculate it for a lot of samples.


ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by ATpoint9.1k
gravatar for chaton
7 weeks ago by
chaton0 wrote:

The problem is mainly that you used bedtools intersect the wrong way. If you want to calculate the number of reads covering you bed file, then your reference (file after -a) is the bed file and the query (file after -b) is your ordered bam file. You can also use the -c option and specify a sorted order via a genome reference file (see the help page of the bedtools suite). To count your total number of reads, there is also a -c option in samtools view... I suggest you to proceed like the following:

1) calculate your bed file coverage:

$ sort -k1V -k2,2n bedfile.bed
$ bedtools intersect -a bedfile.bed -b bamfile.bam -c -sorted -g ref.genome | awk '{i+=$n}END{print i}'

where $n is the column where you have your number of reads per bed region (column added by bedtools intersect, depends on the number of columns in your starting bed file), and i will give you the total number of reads covering your enriched regions. Then you just have to divide this value by the total number of reads of your bam file.

2) get your bam file size (nb of reads):

$ samtools view -c bamfile.bam

A little less automatic than with subread maybe but at least you don't need to install anything new... Good luck!

ADD COMMENTlink written 7 weeks ago by chaton0

Good catch with the switched -a/b, didn't even notice that :)

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by ATpoint9.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1862 users visited in the last hour