How Do You Get Tag Counts From Paired-End Reads?
1
0
Entering edit mode
10.8 years ago
KCC ★ 4.1k

I was wondering what approaches are commonly used to get tag counts from paired-end reads in ChIP-seq data. Naively, I wrote a script that takes each pair of paired-end reads and stitches them together to make a fragment. So my pipeline is:

  • Convert BAM to BED:

    bamToBed -i reads.bam > reads.bed

  • Merge reads into fragments (I wrote a custom script here):

    python stitch.py reads.bed > fragments.bed

EXAMPLE:

read1: chr1 5 10 +

read2: chr1 41 50 -

fragment: chr1 5 50 +

  • BED to BedGraph:

    genomeCoverageBed -bga -i fragment.bed -g chrom.sizes > fragment.bedGraph

Basically, I am creating counts between the two reads to fill out the whole fragment. I could see someone arguing that I am creating data. So, I was wondering if my approach is similar to what others do when trying to visualize paired-end data.

paired genome coverage • 2.8k views
ADD COMMENT
0
Entering edit mode

I would vote that this is kind of "creating" data. You're not sequencing the insert, so why use it as evidence for anything? If you need tag counts, why not count only reads mapping in a proper pair (by SAM flag)?

ADD REPLY
1
Entering edit mode
10.8 years ago

In the end this all boils down to what are you using this data for. This extension method could be useful in finding out which bases are most often covered by the fragments.

On the other hand it should not be passed to methods that make use of statistical methods to infer various attributes as it would mix data that was measured with data that was 'created'.

ADD COMMENT
0
Entering edit mode

Yes, ultimately I am using it to provide a more refined version of tag extension which is common in ChIP-seq data.

ADD REPLY

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6