Question: Starrseq data how can I characterise reads that maps out of the targeted regions.
0
gravatar for morovatunc
7 days ago by
morovatunc360
Turkey
morovatunc360 wrote:

Hi,

We conducted starrseq experiment that measures the actives of the given library. (Say enhancers). Then, we send this data to WGS.

Background:

Mapping: We have ~250million reads with 150 bp paired end data. We used bowtie with -v 3 -m 1 —best —strata -X 2000 parameters.

Then we analysed mapped data with deepTools.

In deeptools, we used multiBamSummary with a given Bed file. This bed file is actually our library that is consisted of ~8000 regions. (1000 pos control, 5000 neg control, 2000 tested regions). This step simply gives the number of the reads that overlap with our bed region. So for each region,I have the information of the number of overlapping mapped reads.

Problem:

Given that we have ~200 million mapped reads, only 60 million of them actually overlap with our targeted regions.

Question:

Disregarding the starrseq methodology, could you please help me out to find;

  1. Location of rest of the (~140 million) mapped reads?
  2. Why do we have huge amount of unspesific(?) mapping? or simply how would you solve such a problem ?

I know this is a specific question but your past experiences and comments could really help me.

Thank you very much,

T.

ADD COMMENTlink modified 7 days ago • written 7 days ago by morovatunc360

Location of rest of the (~140 million) mapped reads?

You could create a subset BAM minus the regions you are interested in and then use something like Qualimap for a gross overview.

Why do we have huge amount of unspesific(?) mapping? or simply how would you solve such a problem ?

Some kind of experimental contamination (I don't know what STARR-seq is)?

ADD REPLYlink modified 7 days ago • written 7 days ago by genomax46k

Did you do Cap-STARRseq, so capturing of your target DNA? If so, it is not uncommon that you co-capture all kinds of other genomic regions, which then will be part of your library. Like 30% target-assigned reads sounds pretty ok to me. That will leave you with thousands of reads per target region. Should be more than enough for a STARRseq experiment. Do you have an elaborate statistical framework to make use of these high read counts/high power based on sequencing depth (hope you have replicates)?

ADD REPLYlink written 14 hours ago by ATpoint3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1747 users visited in the last hour