some problem with my RRBS data when dealt with bismark
1
1
Entering edit mode
9.9 years ago
hua.peng1314 ▴ 100

Hi,I have deal with my RRBS data,but the result is strange.

if I use these commands:

/usr/local/bin/trim_galore  --phred33 -q 10 --length 35  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --paired RRBS_R1.fastq RRBS_R2.fastq --retain_unpaired --keep --output_dir .

~/software/bismark_v0.11.1/bismark -q -n 1 -l 40 ~/projects/Methylation/BS_seq/reference1/ -1 RRBS_R1_val_1.fq -2 RRBS_R2_val_2.fq

The result is :

Sequence pairs analysed in total:       12619146
Number of paired-end alignments with a unique best hit: 9550086
Mapping efficiency:     75.7% 
Sequence pairs with no alignments under any condition:  2683530
Sequence pairs did not map uniquely:    385530
Sequence pairs which were discarded because genomic sequence could not be extracted:    0

if I use these commands:

/usr/local/bin/trim_galore  --phred33 -q 10 --length 35  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --paired RRBS_R1.fastq RRBS_R2.fastq --retain_unpaired --keep --output_dir .
/usr/local/fastx/fastq_quality_trimmer -i RRBS_R1_val_1.fq -t 20 -l 20 -o RRBS_R1_val_QC_1.fq -Q 33
/usr/local/fastx/fastq_quality_trimmer -i RRBS_R2_val_2.fq -t 20 -l 20 -o RRBS_R2_val_QC_2.fq -Q 33

~/software/bismark_v0.11.1/bismark -q -n 1 -l 40 ~/projects/Methylation/BS_seq/reference1/ -1 RRBS_R1_val_QC_1.fq -2 RRBS_R2_val_QC_2.fq

The result is

Sequence pairs analysed in total:       12619144
Number of paired-end alignments with a unique best hit: 2646896
Mapping efficiency:     21.0% 
Sequence pairs with no alignments under any condition:  9865457
Sequence pairs did not map uniquely:    106791
Sequence pairs which were discarded because genomic sequence could not be extracted:    0

If I use these commands:

/usr/local/fastx/fastq_quality_trimmer -i RRBS_R1.fastq -t 20 -l 20 -o RRBS_R1_QC_1.fq -Q 33
/usr/local/fastx/fastq_quality_trimmer -i RRBS_R2.fastq -t 20 -l 20 -o RRBS_R2_QC_2.fq -Q 33
/usr/local/bin/trim_galore  --phred33 -q 10 --length 35  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -a2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --paired RRBS_R1_QC_1.fq RRBS_R2_QC_2.fq --retain_unpaired --keep --output_dir .
~/software/bismark_v0.11.1/bismark -q -n 1 -l 40 ~/projects/Methylation/BS_seq/reference1/ -1 RRBS_R1_QC_1_val_1.fq -2 RRBS_R2_QC_2_val_2.fq

The result is

Sequence pairs analysed in total:       12610256
Number of paired-end alignments with a unique best hit: 592
Mapping efficiency:     0.0% 
Sequence pairs with no alignments under any condition:  12609412
Sequence pairs did not map uniquely:    252
Sequence pairs which were discarded because genomic sequence could not be extracted:    0

The significant difference confused me. Appreciate for your help.

RRBS • 2.9k views
ADD COMMENT
3
Entering edit mode
9.9 years ago

NEVER USE FASTX TOOLS FOR PAIRED-END FILES!!!

fastq_quality_trimmer is making your paired-end files out of sync, which will make the results complete crap. trim_galore does a good job, there is no reason to use any trimmer after it for RRBS data.

ADD COMMENT
1
Entering edit mode
Hey Devon, How should I do if I want to merge read1 and read2 and then take them as the single end of sequencing data and then do the adaptor trim and alignment? 
ADD REPLY
0
Entering edit mode

Hmm, are these still RRBS reads as in the original post? You could either trim and then merge or merge and then trim, though I have a sneaking suspicion that the former will work slightly better (this will obviously depend on the read length and fragment length).

ADD REPLY

Login before adding your answer.

Traffic: 1320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6