Question

some problem with my RRBS data when dealt with bismark

1

Entering edit mode

11.1 years ago

hua.peng1314 ▴ 100

Hi,I have deal with my RRBS data,but the result is strange.

if I use these commands:

/usr/local/bin/trim_galore  --phred33 -q 10 --length 35  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --paired RRBS_R1.fastq RRBS_R2.fastq --retain_unpaired --keep --output_dir .

~/software/bismark_v0.11.1/bismark -q -n 1 -l 40 ~/projects/Methylation/BS_seq/reference1/ -1 RRBS_R1_val_1.fq -2 RRBS_R2_val_2.fq

The result is :

Sequence pairs analysed in total:       12619146
Number of paired-end alignments with a unique best hit: 9550086
Mapping efficiency:     75.7% 
Sequence pairs with no alignments under any condition:  2683530
Sequence pairs did not map uniquely:    385530
Sequence pairs which were discarded because genomic sequence could not be extracted:    0

if I use these commands:

/usr/local/bin/trim_galore  --phred33 -q 10 --length 35  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --paired RRBS_R1.fastq RRBS_R2.fastq --retain_unpaired --keep --output_dir .
/usr/local/fastx/fastq_quality_trimmer -i RRBS_R1_val_1.fq -t 20 -l 20 -o RRBS_R1_val_QC_1.fq -Q 33
/usr/local/fastx/fastq_quality_trimmer -i RRBS_R2_val_2.fq -t 20 -l 20 -o RRBS_R2_val_QC_2.fq -Q 33

~/software/bismark_v0.11.1/bismark -q -n 1 -l 40 ~/projects/Methylation/BS_seq/reference1/ -1 RRBS_R1_val_QC_1.fq -2 RRBS_R2_val_QC_2.fq

The result is

Sequence pairs analysed in total:       12619144
Number of paired-end alignments with a unique best hit: 2646896
Mapping efficiency:     21.0% 
Sequence pairs with no alignments under any condition:  9865457
Sequence pairs did not map uniquely:    106791
Sequence pairs which were discarded because genomic sequence could not be extracted:    0

If I use these commands:

/usr/local/fastx/fastq_quality_trimmer -i RRBS_R1.fastq -t 20 -l 20 -o RRBS_R1_QC_1.fq -Q 33
/usr/local/fastx/fastq_quality_trimmer -i RRBS_R2.fastq -t 20 -l 20 -o RRBS_R2_QC_2.fq -Q 33
/usr/local/bin/trim_galore  --phred33 -q 10 --length 35  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --paired RRBS_R1_QC_1.fq RRBS_R2_QC_2.fq --retain_unpaired --keep --output_dir .
~/software/bismark_v0.11.1/bismark -q -n 1 -l 40 ~/projects/Methylation/BS_seq/reference1/ -1 RRBS_R1_QC_1_val_1.fq -2 RRBS_R2_QC_2_val_2.fq

The result is

Sequence pairs analysed in total:       12610256
Number of paired-end alignments with a unique best hit: 592
Mapping efficiency:     0.0% 
Sequence pairs with no alignments under any condition:  12609412
Sequence pairs did not map uniquely:    252
Sequence pairs which were discarded because genomic sequence could not be extracted:    0

The significant difference confused me. Appreciate for your help.

RRBS • 3.2k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by hua.peng1314 ▴ 100

score 3 · Answer 1 · 2014-06-11

3

Entering edit mode

11.1 years ago

Devon Ryan 105k

NEVER USE FASTX TOOLS FOR PAIRED-END FILES!!!

fastq_quality_trimmer is making your paired-end files out of sync, which will make the results complete crap. trim_galore does a good job, there is no reason to use any trimmer after it for RRBS data.

ADD COMMENT • link 11.1 years ago by Devon Ryan 105k

1

Entering edit mode

Hey Devon, How should I do if I want to merge read1 and read2 and then take them as the single end of sequencing data and then do the adaptor trim and alignment?

ADD REPLY • link 9.6 years ago by Shicheng Guo ★ 9.6k

0

Entering edit mode

Hmm, are these still RRBS reads as in the original post? You could either trim and then merge or merge and then trim, though I have a sneaking suspicion that the former will work slightly better (this will obviously depend on the read length and fragment length).

ADD REPLY • link 9.6 years ago by Devon Ryan 105k