Question: Removing adaptors from smallRNA seq data
0
gravatar for juan.crescente
5 months ago by
juan.crescente20 wrote:

Hello!

I know this have been asked in many ways before but I've been struggling a while now so it's time to ask.

I'm trying to use small RNA seq data from: https://bmcplantbiol.biomedcentral.com/articles/10.1186/1471-2229-14-142

These sequences are ~34nt length so they have some kind of adaptor with no doubt.

They use ‘vector strip’ in the EMBOSS package, but I cannot find the suitable vector file.

I've tried with trimmomatic but I still get the same read length

java -jar trimmomatic-0.38.jar SE -phred33 /home/juan/Desktop/juan/bio/mrcv/data/sun/SRR1195024.fastq.gz /home/juan/Desktop/juan/bio/mrcv/data/sun/SRR1195024.trimmed.fastq.gz ILLUMINACLIP:adapters/TruSeq-Small-RNA.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18

I've tried with cutadapt but I still get the same read length

cutadapt -a TGGAATTCTCGGGTGCCAAGG -o SRR1195024.trimmed.fastq.gz SRR1195024.fastq.gz

I've tried with trim galore but I still get the same read length

trim_galore --small_rna SRR1195025.fastq.gz .fastq.gz -o SRR1195025.trimm_gal.fastq.gz

Total reads processed: 14,011,412 Reads with adapters:
8,639,554 (61.7%) Reads written (passing filters): 14,011,412 (100.0%)

Trim galore seems to be doing it's work (61% of sequences with adapter) but then I open fastqc and see that the sequences are not the expected lenght, they're all 34nt.-

enter image description here

I expect to sea a peak in 21 / 24 nt., but it is flat as earth. Any ideas what am I doing wrong?

adaptor smallrna trimming • 280 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by juan.crescente20

Convert a subset of the data to fasta and see if you can align the reads on the 3'-end to identify an adapter sequence. Did you check the methods section to see if they describe a kit/method used.

$ reformat.sh in=SRR1195024.fastq.gz out=stdout.fa | grep -v "^>" | head -200
ADD REPLYlink written 5 months ago by genomax74k

Yes, I see nothing with reformat. They do not specify adapters

ADD REPLYlink written 5 months ago by juan.crescente20
1

The adapter would likely not be in the same exact location (if it is indeed on 3'-end) so you may or may not see it right away, without actually trying to align the sequences.

I will leave this for you to consider:

cap

There are two papers linked which seem to have sequences etc in their supplementary materials. Have you looked at those?

ADD REPLYlink written 5 months ago by genomax74k

I'm checking this MAS with your feedback. https://mafft.cbrc.jp/alignment/server/spool/_ho.190611011723805E0SZhm924bXDkjdeAHfqVlsfnormal.html

what papers are those? in Electronic supplementary material?

ADD REPLYlink written 5 months ago by juan.crescente20

When running cutadapt are you confident that you're using the correct adapter sequence? Running these adapter trimming software with no cuts happening makes me think that you're using an incorrect sequence. Do they specify the sequence in the manuscript? Does fastqc specify an overrepresented sequence?

ADD REPLYlink written 5 months ago by shawn.w.foley1.1k

I see tons of over represented sequences, but I do not get hits with adapters anywhere

ADD REPLYlink modified 5 months ago • written 5 months ago by juan.crescente20

Do a multiple sequence alignment of the last ~15 nucleotides of some hundred of reads and you should be able to identify the sequence of your adapter

ADD REPLYlink written 5 months ago by Martombo2.6k

Done, tried it, still getting that weird distribution of reads length where almost all are 34nt.

=== Summary ===

Total reads processed:              14,011,412
Reads with adapters:                 8,639,554 (61.7%)
Reads written (passing filters):    14,011,412 (100.0%)

Total basepairs processed:   490,399,420 bp
Quality-trimmed:              33,500,011 bp (6.8%)
Total written (filtered):    447,282,744 bp (91.2%)

=== Adapter 1 ===

Sequence: TGGAATTCTCGG; Type: regular 3'; Length: 12; Trimmed: 8639554 times.
ADD REPLYlink modified 5 months ago by genomax74k • written 5 months ago by juan.crescente20

well it does seem like 60% of reads were trimmed, right?

ADD REPLYlink written 5 months ago by Martombo2.6k

yes! that part looks good. The problem now is that I still see a huge and only peak in 34nt. I'm expecting to see 21 and 24 peaks (and some more).

ADD REPLYlink written 5 months ago by juan.crescente20

Reads which actually have the adapters should be data you are interested in. That looks to be a healthy (relatively) % above. Separate those reads and then do fastqc on them.

ADD REPLYlink written 5 months ago by genomax74k

I should only keep thos 61.7% of reads and then quality trim them? Is there a way to keep only those with trimmomatic? what I'm seeing is that it keeps all the reads

ADD REPLYlink written 5 months ago by juan.crescente20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1368 users visited in the last hour