Question: bowtie (1) 0% alignment on paired-end RNAseq data
0
gravatar for gulamaltab
9 days ago by
gulamaltab0
gulamaltab0 wrote:

Hi

I am trying to align paired-end miRNA-seq data using bowtie, I have tried a number of times using different options, however, it seems like I am doing something wrong. I am getting 0.3-1% alignment rate. I have tried bowtie2 which gave me 96% alignment rate.

I have used the defult options first:

bowtie  ~/work/BWA/refrat.fa -p 2 -1 19-8883-R1.fastq -2 19-8883-R2.fastq -S 19-8883.sam 2> 19.log

Also I have used these options: bowtie –q –n 0 –e 80 –l 18 –a –m 5 –best –strata

Also I have used these options: bowtie –q –v 1 –a –m 5 –best –strata.

But i kept getting the similar results as follows:

# reads processed: 32291169
# reads with at least one reported alignment: 79980 (0.25%)
# reads that failed to align: 32211189 (99.75%)
Reported 172518 paired-end alignments to 1 output stream(s)

Any suggestions? Thanks

mirna rna-seq bowtie alignment • 201 views
ADD COMMENTlink modified 9 days ago by ATpoint15k • written 9 days ago by gulamaltab0

What if you try to align R1 or R2 independently? There might be an issue with insert sizes.

ADD REPLYlink written 8 days ago by igor7.6k

I will try that, If I do that how do I combine the files in the end? is it possible?

ADD REPLYlink written 8 days ago by gulamaltab0
1
gravatar for ATpoint
9 days ago by
ATpoint15k
Germany
ATpoint15k wrote:

My guess is that your raw data are not properly (or not at all) trimmed for adapter sequences. In miRNA-seq, your targets are typically small, somewhat < 30bp. Your read length is probably 2x50bp or longer so you will pick up adapter content. Bowtie2 unlike Bowtie by default supports local alignments which means it can soft-clip non-matching (=adapter) content while still align the local part of the read that matches the reference. With Bowtie the read will probably go unaligned due to the many mismatches. Please run fastqc and post the adapter content part (How to add images to a Biostars post) or if you did trimming show the command line.

Also, just to be sure, your index is in a folder called bwa, hope this is co-incidence and you are not trying to use a bwa index with bowtie?

ADD COMMENTlink modified 9 days ago • written 9 days ago by ATpoint15k

Hi ATpoint, I have checked the Fastqc on all my samples and adapters were trimmed. I have uploaded a picture of one of the sample. Yes I have indexed using bowtie-build and BWA is just the folder in that example above.

enter image description here

ADD REPLYlink modified 8 days ago by ATpoint15k • written 8 days ago by gulamaltab0
1

Something does not sound right about this.

  1. Do you know why 150 bp PE sequencing was chosen if you are only looking at miRNA/small RNA libraries?
  2. Have you checked to see what kit was used for creating these libraries? Many miRNA kits have specific instructions that include specific adapter sequence to look for (FastQC will not know about this) and how the data needs to be processed to remove these extraneous sequences before alignment.

I have tried bowtie2 which gave me 96% alignment rate.

Against the same genome? Have you looked in the alignment file to see if a large part of the read(s) is getting soft-clipped? I have a feeling that would be the case.

Since bowtie v.1 can't do gapped alignments, it is unable to use the entire read (since each read likely has extraneous sequence) so you are getting that poor alignment percentage. Proper trimming as suggested by @ATPoint should help alleviate this.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax65k

Hi, thanks for the reply. I am not sure why 150 bp PE was chosen, Illumina only gave me this option. As for the library, the NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina was used.

Yes 96-97% alignment rate againgst the same genome. I am new to rna-seq, not sure how to check for soft clipping, i will have a look at it.

The raw Fastq files were trimmed for the presence of Illumina adapter sequences using Cutadapt version 1.2.1. The option -O 3 was used, so the 3' end of any reads which match the adapter sequence for 3 bp. or more are trimmed.

The reads were further trimmed using Sickle version 1.200 with a minimum window quality score of 20. Reads shorter than 20 bp. after trimming were removed.

Thanks for your input.

ADD REPLYlink written 8 days ago by gulamaltab0

What does the size distribution plot look like if you run FastQC on the trimmed FastQ files?

ADD REPLYlink written 8 days ago by Friederike3.8k

this is the graph of size distribution

<a rel=" />

ADD REPLYlink written 8 days ago by gulamaltab0

That is interesting.

What is R0/R1/R2? Is R0 the merged representation of R1/R2 reads?

So most of your reads (after trimming?) are around 20-30 bp. Can you clarify each step you have done (starting with raw data) to get this plot? Include command lines for all programs you used in the process (use dummy paths/file names, if you want to).

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax65k

NEBNext® Ultra™ Directional RNA Library Prep Kit

Are you sure this kit is appropriate/right choice for smallRNA libraries? I don't see any mention of small RNA's after a cursory look at the web link.

You may have received normal RNAseq libraries.

Reads shorter than 20 bp. after trimming were removed.

Yow that may be some of the smallRNA data you want :-)

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax65k

hmm, I have used the same data just the one strand R1 in mirdeep2 package, which uses bowtie 1 to align the reads. that gave an alignment rate of 82% overall. Could it be something to do with paired-end alignment?

ADD REPLYlink written 8 days ago by gulamaltab0

I don't know if mirdeep2 post processes the reads in some way since bowtie v.1. is unable to align them on its own.

ADD REPLYlink written 8 days ago by genomax65k

Hmm, that makes little "biological" sense. Are you sure this is a smallRNA-seq dataset? Is this your data or published (from GEO or so)? With smallRNA you must (should) pick up adapter content at this read length. Maybe the adapter sequence is not known to fastqc but this implies non-standard library prep. How has the library been made?

ADD REPLYlink modified 8 days ago • written 8 days ago by ATpoint15k

Hi ATpoint Yes it is smallRNA-seq data set and yes it is my own data. Please kindly have look at the above reply to @genomax regarding the library.

ADD REPLYlink written 8 days ago by gulamaltab0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1806 users visited in the last hour