Question: smallRNA low percentage of mapping, N at the beninning of the reads and kmers
gravatar for noeD
22 months ago by
noeD70 wrote:


I am working with smallRNA data. I have analyzed the fastq with fastqc, and I saw that there were illumina small RNA 3' adapter, in fact my sequence length distribution were centered on 51. Therefore I have used cutadapt in order to remove that adapter and my sequence length distribution changed:

After that I aligned my reads against reference genome (hg38) with botwie, using default parameters, in order to see how it performed. I obtained a very very low percentage of mapped read (0.30%).

I have checked again my fastq file with fastqc and I saw that there were several kmers at the end of the reads. Is it normal?

I have upload all images from fastqc at this link: Are there other adapter that I should trim? At the beginning of the reads I saw that in some case there were N, should I trim them?

I reported here and extract of my fastq:

@HISEQ2500:231:C9L77ACXX:1:2316:21153:100286 1:N:0:NTAGCT
@HISEQ2500:231:C9L77ACXX:1:2316:21183:100346 1:N:0:NTAGCT
@HISEQ2500:231:C9L77ACXX:1:1101:1376:1894 1:N:0:CTAGCT
@HISEQ2500:231:C9L77ACXX:1:1101:1314:1913 1:N:0:CTAGCT

As you can see, in the first read there isn't a N at the beginning of the read, but it is presented in the index of the reads. In the last read exactly the opposite is happening: N at the beginning of the read, but not in the index of the reads.

How should fix that issue?

Thank you in advance


rna-seq alignment smallrna • 786 views
ADD COMMENTlink modified 22 months ago by Brian Bushnell16k • written 22 months ago by noeD70

small RNA data analysis requires pre-processing of the data in specific ways (based on the kit used etc). You may want to try a dedicated pipeline (e.g. miRquant or miRdeep2 ) for this purpose.

ADD REPLYlink written 22 months ago by genomax67k
gravatar for Brian Bushnell
22 months ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

You should trim leading/trailing Ns; they never help alignment, and are particularly bad with Bowtie as it allows very few mismatches. You can do that by quality-trimming to a q-score of 2. On the other hand, if the exact starting position of the read is important, just discard the reads containing Ns. As far as adapter contamination goes... if you successfully trimmed using Illumina's Small RNA adapters as a reference, I don't see a point in trying other adapter sequences as well.

You can also remove reads with Ns in the barcodes, or barcodes that do not exactly match the expected barcodes, during the demultiplexing process. I recommend this if you are multiplexing, to prevent crosstalk between libraries. It's also possible to remove them after the fact - BBDuk has the flags "barcodefilter" and "barcodes" for that purpose. If crosstalk is not a problem for the experiment, there's no reason to remove them.

As for the low alignment rate, it's hard to say what might cause that (could be that the library is mostly not human, for example). I'd suggest trying other aligners (bowtie2, bwa-aln, BBMap) to see if they improve things, and you might try BLASTing some of the longest unaligned reads to nt/RefSeq to see what they hit, though that's much more useful with longer reads.

ADD COMMENTlink written 22 months ago by Brian Bushnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour