Question: Calculate strand ratio for repeat (ALU) elements
0
gravatar for A. Domingues
10 weeks ago by
A. Domingues2.4k
Dresden, Germany
A. Domingues2.4k wrote:

I am trying to quantify/estimate the amount of double stranded ALU elements to compare two conditions. My very coarse approach was to align the reads to repeat element sequences, and then summarize how many reads map sense or antisense to each element. If I summarize all repeat elements, there is a bias to sense-mapping reads (~1.3). However, in the ALU elements, the ratio is nearly 1 with a shift in one of the conditions - which matches the experimental hypothesis.

The issue is that due to the repetitive nature of these elements I am having second thoughts about if this approach is at all valid.

Briefly, my approach to calculate strand bias in repeat elements:

  • rRNA depleted RNA-seq, stranded library
  • reads were mapped to transposable element sequences (derived from repeatmasker, one contig = one element, example below) with STAR, keeping one random alignment for any read that maps up to 100 locations
  • Alignments in each repeat element sequence were then counted with the Bioconductor package Rsamtools with the following setting:
    • repeat elements were considered those whose name doesn't match "^5S|^7S|_n$|rRNA|^tRNA|^U[0-9]|^RNA"
    • only proper read pairs were counted
    • alignments in the forward stand isFirstMateRead = TRUE, isMinusStrand = TRUE
    • alignments the reverse strands isFirstMateRead = TRUE,isMinusStrand = FALSE
  • A ratio of sense /antisense reads was then calculate for each repeat element sequence.

Does this make sense at all? Is there a better way of doing it?

grep "AluJb" -A 100 all_repeats.hg38.fa | head -20
>AluJb
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNTCCATAAGAATGGAAAGAAAACATGGCCAGGTGCAGTGGC
TCACACCTGTAATCCCACCACTTCAGGAGGCTGAGGCAACATGGCAAAACCTTCTCTTCA
AAAAATTTTTTAAAAGTTAGCTGGATGTTGTGGAGGCAAGAGGATCACTTGAGGATCACT
TGAGTCCATGAGGTCAAGGCTGCAGTGAGTCATGTTTGCACCACTGCACTCTAGCCTAGG
TGACAGAGCTAGTCACTATCAAAAAAAAAAAAAAAAGAATGGAGAGAATGCTACATGAGA
GAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNATAGATTTTTTTAAAAAGAAAACTGGCCAGGTACT
GTGGCTTATGTCTGTAATATCAGCATGTTGGGAGGCCAAGGCAGGATTACTTGAGCCCAG
AAATTCCAGACCAGCCTGAGAATTTGGCAAAACTCTGTCTCTACAAAAAATACAAAAATT
AGCCAAGTTTGGTGGCATGTGCCTGTAGTACCAGCTACTTGGGAGGCTGAGGTGGAAGAA
TAGCTTGAGTCTGGGAGGTCAAGGCTGCAATGAGCTGTGATTGCACCACTGCACTCAAGC
CTGGGTGGTAGAGTAAGACCCTGTCTCAAAAAAAAAAAAAAAAAAAGAAAAATCACTAAG
CAAAATAAGACATGTGAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ADD COMMENTlink modified 9 weeks ago • written 10 weeks ago by A. Domingues2.4k
0
gravatar for A. Domingues
9 weeks ago by
A. Domingues2.4k
Dresden, Germany
A. Domingues2.4k wrote:

I shared this question on twitter and it seems that this approach appears to be ok. A few suggestions as alternative/ complementary approaches :

  • use repBase (or consensus) sequences instead of repeatmasker
  • map to genome and intersect with annotated TE coordinates.
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by A. Domingues2.4k

If no one adds a valid answer in the coming days I will accept my own answer.

ADD REPLYlink written 9 weeks ago by A. Domingues2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2154 users visited in the last hour
_