Question: Short sequence alignment with low alignment rate
0
gravatar for jesselee516
21 months ago by
jesselee51690
United States
jesselee51690 wrote:

Hi, all. I am currently facing a alignment problem, and I don't have any idea right now. Now, I am trying to align Rice DNase-seq to IRGSP build4 genome. I am using SRR094111.sra data, and convert it to fastq with default parameter. I use bowtie2 to do the alignment(parameter:bowtie2 -p 8 --local -x /all_format -U SRR094111.fastq -S SRR094111.sam), But I just got the following low rate alignment. I am not sure what is going on. I have tried to trim last several parameters, but I don't know how many base pair I should trim like that. Can anyone show me the detail pipeline to make right alignment? Thanks a lot.

*20896348 reads; of these:
  20896348 (100.00%) were unpaired; of these:
    20676176 (98.95%) aligned 0 times
    152269 (0.73%) aligned exactly 1 time
    67903 (0.32%) aligned >1 times*

First several line of fastq data(May helpful):

@SRR094111.1 HWUSI-EAS465_0004:3:1:1043:13479 length=36
GGTAGTAATTGACAAAAGNTCTCGTATGCCGTCTTC
+SRR094111.1 HWUSI-EAS465_0004:3:1:1043:13479 length=36
??6;<@CCCCCC@?@<79!:59897C@CCBCBBCC#

@SRR094111.2 HWUSI-EAS465_0004:3:1:1043:15713 length=36
GAATGCCTGATTGCCTGTAGGTCGTATGCCGTCTTC
+SRR094111.2 HWUSI-EAS465_0004:3:1:1043:15713 length=36
CCCCCBCCCCCCCCCCCCCCCCCC?CCCCCCCCCCC

@SRR094111.3 HWUSI-EAS465_0004:3:1:1043:15796 length=36
ATGGACCATCATCAGCCATCTTCGTATGCCGTCTTC
+SRR094111.3 HWUSI-EAS465_0004:3:1:1043:15796 length=36
CCCCC;CC;CCCBBBCBACCCCCCAA=CCCCBCACC

@SRR094111.4 HWUSI-EAS465_0004:3:1:1043:14078 length=36
TGTTACTTGACGCACAATAATTCGTATGCCGTCTTC
+SRR094111.4 HWUSI-EAS465_0004:3:1:1043:14078 length=36
BAB@:BBA?BBB@B:AAA:<B?BB;B<ABBBBBBB?

The GEO description about data:

The degree of DNase I digestion was assessed by pulsed-field gel electrophoresis (PFGE: 20–60 switch time, 18 h, 6 V/cm; Bio-Rad). High molecular weight (HMW) DNA after DNase I digestion was isolated, blunt ended with T4 DNA polymerase. Biotinylated adaptor I (5’ Bio ACAGGTTCAGAGTTCTACAGTCCGAC and 5’ P- GTCG GACTGTAGAACTCTGAAC) was ligated to the DNA molecules. Dynal M-280 beads (Invitrogen) were used for enriching DNase I digested DNA ends after MmeI digestion. Adaptor II (5’ P-TCGTATGCCGTCTTCTGCTTG and 5’ CAAGCAGAAGACGGCATACGANN) was then ligated to the MmeI treated ends. The DNA sample was amplified by PCR using linker-specific primers (5’ CAAGCAGAAGACGG CATACGA and 5’AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA), and purified by PAGE for isolation of DNA fragments with about 90 bp in size. The final Illumina sequencing was performed using a primer specific to linker I (5’CCACCGACAGGTTCAGAGTTCTACAGTCCGAC).

*The following is my fastQC report with failure information, others are all good.*

https://drive.google.com/file/d/0B-nCMrsqGWH3RmdQOXl3T1dkblU/view?usp=sharing

https://drive.google.com/file/d/0B-nCMrsqGWH3VEVNX09SOVJRUWM/view?usp=sharing

https://drive.google.com/file/d/0B-nCMrsqGWH3bTBQa29ORUVNOE0/view?usp=sharing

bowtie next-gen alignment • 707 views
ADD COMMENTlink modified 20 months ago by Biostar ♦♦ 20 • written 21 months ago by jesselee51690

While the Q-scores are not great they are atrocious either. Since this is old GAII data I would suggest that you take into account (it is likely in Illumina format, phred+64). Try bowtie (instead of bowtie2) to see if ungapped alignments improve things. Trying to replicate the analysis in what ever paper this came from as closely as possible should be done first before you veer off in other directions.

ADD REPLYlink written 21 months ago by genomax63k

The results from (what I think is) the original paper are remarkably different:

We obtained a total of 43 million sequence reads from the seedling libraries and 57 million reads from the callus libraries (Supplemental Table S1). Approximately 70% of the reads were mapped to unique positions in the rice genome.

MAQ with a 1bp mismatch was used to align the DNAseq reads.

ADD REPLYlink written 21 months ago by h.mon24k

Are you sure that you align to the correct reference? Did you download and index the genome yourself or was it provided by someone else, did you sucessfully align other data to that reference before? Would be the most obvious explanation before you start chasing ghosts.

ADD REPLYlink written 20 months ago by ATpoint14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2327 users visited in the last hour