Question: Help simulating RRBS reads
0
gravatar for Jautis
3.9 years ago by
Jautis270
United States
Jautis270 wrote:

Hello, does anybody have suggestions for how to simulate RRBS reads with a known methylation level? I have a reference for a non-model species and want to make sure reads map appropriately before beginning data generation. 

 

Anything helps, thank you!

rrbs methylation simulate reads • 1.3k views
ADD COMMENTlink modified 3.8 years ago by dariober10.0k • written 3.9 years ago by Jautis270
2
gravatar for mark.ziemann
3.8 years ago by
mark.ziemann1.1k
Australia/Mebourne/Geelong/Deakin
mark.ziemann1.1k wrote:

There is a bisulfite sequence simulator that may help (http://www.bioinformatics.babraham.ac.uk/projects/sherman/). What you also need is a BED file of regions that are well covered in the genome with RRBS. Then use bedtools getFasta to extract the sequence of all RRBS covered regions. Then use Sherman to generate simulated bisulfite seq data for these regions.

ADD COMMENTlink written 3.8 years ago by mark.ziemann1.1k
1

theoretically, for directional RRBS library, reads start with CGG or TGG. why in a real experiment we see reads start from 3 different bases (although very few of them)? Thanks.

ADD REPLYlink written 2.9 years ago by Ming Tang2.5k
2

Star activity (MspI or whatever else is used cutting outside its normal site) or the general degradation caused by bisulfite treatment are the likely culprits in my mind.

ADD REPLYlink written 2.9 years ago by Devon Ryan89k
1
gravatar for dariober
3.8 years ago by
dariober10.0k
WCIP | Glasgow | UK
dariober10.0k wrote:

Ideally RRBS should select genome fragments starting and ending with CCGG and of size between, say, 50 and 1000bp, right? So you could first extract such fragments and produce a reference genome containing only these fragments. Then you could run Sherman as suggested by mark.ziemann on this restricted reference. By the way, if you just want to know how many theoretical fragments you can obtain from RRBS and whether these are mappable you don't need to assume a particular methylation level since BS-Seq mappers bioinformatically "demethylate" all the reads before mapping.

I wrote a simple program to extract sequences from a fasta file matching a regular expression (fastaRegexFinder.py). To obtain RRBS fragments from a reference you could use something like:

fastaRegexFinder.py -q -r 'CCGG(.(?<!CCGG)){50,1000}(?=CCGG)' -f genome.fa --noreverse

The extracted sequences are in the last column of output, you should append CCGG to these. Example:

cat genome.fa
>seq1
zzCCGGaaaCCGGbbbbCCGGxxxxxxCCGGdddCCGGeeeeCCGGCCGGfffCCGGhhhhCCGGiiiiiCCGGlllCCGGmmmmCCGGzz
>seq2
zzCCGGaaaaaaaCCGGbbbbbbbCCGGzzzzzzzzzzzzzzzzzzzzzzzzzzzzzCCGGaaa
>seq3
zzCCGGaaaaaaaCCGGCCGGbbbbbbbCCGGzzz

fastaRegexFinder.py -q -r 'CCGG(.(?<!CCGG)){5,10}(?=CCGG)' -f genome.fa --noreverse
seq1    17    27    seq1_17_27_for    10    +    CCGGxxxxxx
seq1    61    70    seq1_61_70_for    9    +    CCGGiiiii
seq2    2    13    seq2_2_13_for    11    +    CCGGaaaaaaa
seq2    13    24    seq2_13_24_for    11    +    CCGGbbbbbbb
seq3    2    13    seq3_2_13_for    11    +    CCGGaaaaaaa
seq3    17    28    seq3_17_28_for    11    +    CCGGbbbbbbb

Did I get it right? does it make sense?

 

ADD COMMENTlink written 3.8 years ago by dariober10.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1143 users visited in the last hour