Is removing strand information when obtaining intergenic regions from GRanges necessary?
1
0
Entering edit mode
8 weeks ago
Pratik Mehta ▴ 590

Hello Biostars Community,

I am attempting to obtain bed file of intergenic regions from a custom GenomicRanges object.

I am following this tutorial:

https://research.stowers.org/cws/CompGenomics/Tutorial/peak_assignment.html#orgb42e0a8

The tutorials guides me to remove strand information:

# remove strand information
strand(genic) <- '*'


Could someone "explain like I'm five" why this makes sense?

My guess is: Both strands should have the same genomic information for the most part, right? I guess one strand might be sequenced better than the other strand in some locations therefore removing strand information gives information about both strands to sort of the sum together the information that the either may lack?

Pratik

3
Entering edit mode
8 weeks ago

Strand information is super important for RNA-seq data analysis, but is usually irrelevant for ChIP-seq data analysis once peaks have been called. ChIP-seq is about sequencing double-stranded DNA fragments associated with a particular protein, so from the start, the biological information is "unstranded".

That being said, strand information can be used during peak calling to refine peak position because of sequencing assymetry as shown below (picture from MACS2 manual) because only the left and right extremities of ChIPed DNA fragments are sequenced,

I guess one strand might be sequenced better than the other strand in some locations therefore removing strand information gives information about both strands to sort of the sum together the information that the either may lack?

As you can see, both strands usually carry the same level of ChIP signal. They are indeed summed together during peak calling, so that peaks are unstranded, reflecting the unstranded nature of ChIP data.

1
Entering edit mode

Thank you very much Carlo Yague!

Does the same apply to methylation? I am using a methylation array for this study. (Just used the ChIP tutorial to get the intergenic regions to intersect with the CpG loci.) Each probe on the methylation array corresponds to a CpG loci. I think strand information might matter in my case? Perhaps, because each probe is strand-specific? I think I got my answer. It does matter in this case.

1
Entering edit mode

If the method is based on bisulfite treatment, it is strand-specific indeed.