Question

Small RNA seq dataset with NNN addition

0

Entering edit mode

8.9 years ago

gamacarvalho.m • 0

Hi, I just got an Illumina small RNA seq dataset to analyze from a collaborator that had the adaptors removed by the sequencing service provider. FastQC of the dataset gave a really strange report, with very low quality scores and high % of Ns and I noticed that almost all the reads had exactly 35 nt. Manual inspection of the files revealed that many reads actually contained either 35 NNs or N + 20 something nt miR sequence + NN filler to 35nts. It seems that the adaptor trimming script that was used just replaced the adaptors with NNNs... Has anyone ever seen this? And does anyone have a nice script to deal with this they would be willing to share just to speed things up a bit? Thanks!

RNA-Seq • 3.1k views

ADD COMMENT • link updated 8.9 years ago by GenoMax 152k • written 8.9 years ago by gamacarvalho.m • 0

0

Entering edit mode

N + 20 something nt miR sequence

Do you know its a miR sequence? Have you tried filtering out the all N containing reads, trimming out the N's in the rest of the reads and then mapping them? If you did, what does the expression profile look like?

What percent of reads are all Ns?

Sounds like a 50bp illumina library with the last 15 nucleotides trimmed out. It also sounds like the sequencing itself wasn't great - was the RNA quality checked properly? What were the RIN scores?

ADD REPLY • link 8.9 years ago by ashjay ▴ 40

0

Entering edit mode

Hi, Thanks for your reply. I just blasted a couple of sequences and they mapped to miRs with 100% identity. We are in the process of trimming the reads and filtering them, but was wondering if someone else had come across anything like this. You are right, this is a 50bp sequencing but the initial report I got mentioned the read lengths were between 18 and 50 nt, and what I see is the 35nt profile I mentioned plus a low percentage of reads around 50nt. The percentage of reads with all Ns is around 30%, which I think is compatible with instances when you get merged adaptors. The sample is from an Ago2 pull down, so you cannot really control the quality of the RNA like that.

ADD REPLY • link 8.9 years ago by gamacarvalho.m • 0

1

Entering edit mode

You can use cutadapt to trim out the Ns & low quality bases in addition to removing reads with all Ns.

To remove flanking N bases:

cutadapt --trim-n

To remove reads with more than COUNT number of Ns:

cutadapt --max-n COUNT

ADD REPLY • link 8.9 years ago by ashjay ▴ 40

score 2 · Answer 1 · 2016-08-18

2

Entering edit mode

8.9 years ago

GenoMax 152k

MiSeq reporter will mask the adapter sequences with N's if it was used for downstream processing of data.See this thread for more information.

You can use BBDuk.sh from BBMap to trim those N's by using literal=NNNN option or your could do it by readlength.

ADD COMMENT • link 8.9 years ago by GenoMax 152k

0

Entering edit mode

Thanks! That was really helpfull! I wasn't aware this was MiSeq data and I have previously never worked with it.

ADD REPLY • link 8.9 years ago by gamacarvalho.m • 0