Question: FRIP score ATAC-seq
0
gravatar for Lucy
10 days ago by
Lucy0
Lucy0 wrote:

Hi,

I am trying to calculate the fraction of reads in peaks (FRIP score) for my ATAC-seq samples. What I have so far is written below, however I think it is wrong somehow as I am getting a very low score (when I calculate the FRIP using DiffBind, it comes out much better/more like the expected value e.g. 0.37).

val1=$(bedtools intersect -a <sample_BAM> -b <BED_peakfile> -wa -u | wc -l)
val2=$(samtools view <sample_BAM> | wc -l)
FRIP=$(awk "BEGIN {print "${val1}"/"${val2}"}")

For the peak file, I am using my optimal peak list for the particular cell type (generated using IDR).

Can anyone see what I am doing wrong?

Thanks,

Lucy

atac-seq frip • 178 views
ADD COMMENTlink modified 5 days ago by chaton0 • written 10 days ago by Lucy0

Hi Lucy,

for the future, please use the formatting bar to indicate code. This improves readability. I did it for you this time.

Code

ADD REPLYlink modified 10 days ago • written 10 days ago by ATpoint7.5k
1
gravatar for ATpoint
10 days ago by
ATpoint7.5k
Germany
ATpoint7.5k wrote:

The problem with your solution is that bedtools intersect uses only the read length, probably 50 or 75bp, to do the intersection, rather than the actual fragment length, which for ATAC-seq should be between 80 and a few hundred bp. The same goes for all bedtools commands as far as I know.

I would use featureCounts from the subread package. It is actually made to count reads per reference regions in order to make count matrices but it also outputs the percentage of reads/fragments assigned to it, which is what you want:

## Make a custom "SAF" file which featureCounts needs:
awk 'OFS="\t" {print $1"-"$2+1"-"$3, $2+1, $3, "+"}' foo.narrowPeak > foo.saf

## run featureCounts (add -T for multithreading)
featureCounts -p -a peaks.saf -F SAF -o out.txt atacseq.bam

This will produce an output to screen with the % of assigned reads (=FRiP)and also a file out.txt.summary, from which you can programatically calculate it for a lot of samples.

FRiP

ADD COMMENTlink modified 10 days ago • written 10 days ago by ATpoint7.5k
0
gravatar for chaton
5 days ago by
chaton0
chaton0 wrote:

The problem is mainly that you used bedtools intersect the wrong way. If you want to calculate the number of reads covering you bed file, then your reference (file after -a) is the bed file and the query (file after -b) is your ordered bam file. You can also use the -c option and specify a sorted order via a genome reference file (see the help page of the bedtools suite). To count your total number of reads, there is also a -c option in samtools view... I suggest you to proceed like the following:

1) calculate your bed file coverage:

$ sort -k1V -k2,2n bedfile.bed
$ bedtools intersect -a bedfile.bed -b bamfile.bam -c -sorted -g ref.genome | awk '{i+=$n}END{print i}'

where $n is the column where you have your number of reads per bed region (column added by bedtools intersect, depends on the number of columns in your starting bed file), and i will give you the total number of reads covering your enriched regions. Then you just have to divide this value by the total number of reads of your bam file.

2) get your bam file size (nb of reads):

$ samtools view -c bamfile.bam

A little less automatic than with subread maybe but at least you don't need to install anything new... Good luck!

ADD COMMENTlink written 5 days ago by chaton0

Good catch with the switched -a/b, didn't even notice that :)

ADD REPLYlink modified 5 days ago • written 5 days ago by ATpoint7.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 947 users visited in the last hour