Question

Paired-end RNA Seq data: How to deal with unpaired data after trimmomatic

0

Entering edit mode

7.7 years ago

aggregatibacter ▴ 180

Hi everybody,

what is the best practice to deal with the unpaired data generated by trimming paired-end RNA-Seq data, when only one of the mates makes it through the trimming?

I have seen people recommend to only use the paired data remaining (and ignore the often small unpaired files), but I am afraid to lose crucial data. I could easily process the paired and two unpaired sets per sample separatly

My analysis pipeline is

fastqc - trimmomatic - fastqc - STAR - featureCounts - voom/limma

If trying to use all data, at what point would you recommend to put everything together (and how)?

Many thanks!

RNA-Seq rna-seq trimming • 5.0k views

ADD COMMENT • link updated 7.7 years ago by igor 13k • written 7.7 years ago by aggregatibacter ▴ 180

0

Entering edit mode

Hi guys,

thanks for the quick replies. The unpaired reverse reads are next to nothing (0.2% or something), the forward unpaired usually more like 2 - 5%. Does this sound normal to you?

ADD REPLY • link 7.7 years ago by aggregatibacter ▴ 180

0

Entering edit mode

There is no "normal". Ideally you should not have any. But this is biology and you live with what you have :-)

ADD REPLY • link 7.7 years ago by GenoMax 141k

0

Entering edit mode

If you use BBDuk for trimming paired reads, you will not end up with any singletons, which can make the processing easier. Reads will either be retained as pairs or discarded as pairs. In situations where one read is trimmed down to nothing, the pair is discarded if a minimum length restriction is used. If no limitation is set, the read will be trimmed down to a minimum length of 1bp, so it will still be present and the fastq file will be valid and correctly paired, but it will typically be ignored downstream and only its mate will be used (since 1bp reads don't map).

ADD REPLY • link 7.7 years ago by Brian Bushnell 20k

score 1 · Answer 1 · 2016-08-03

1

Entering edit mode

7.7 years ago

kissaj ▴ 110

Chuck it, it is broken. It shouldn't be very much (%-wise). If it is, you have a problem.

ADD COMMENT • link 7.7 years ago by kissaj ▴ 110

score 0 · Answer 2 · 2016-08-03

If you want to keep them, you might want to put unmapped reads into a separate singleton file and tophat allows singleton input besides pair-end input. Remember you should always keep paired reads in the same order in paired files after QC, because most aligner including tophat recognize the reads pair by their order in files, not by reads ID.

score 0 · Answer 3 · 2016-08-03

0

Entering edit mode

7.7 years ago

igor 13k

STAR already performs soft-clipping, so you shouldn't need to trim the reads.

ADD COMMENT • link 7.7 years ago by igor 13k

0

Entering edit mode

I have primarily decided to use trimmomatic because of an adapter contamination in the raw data after demuxing.

For what it is worth, I decided to go all the way and use the program to trim bad bases, too, bascially using the options from the manual.

ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Does this seem appropriate to you, or would you rather suggest to limit this to the adapter removal and use STAR to soft clip?

ADD REPLY • link 7.7 years ago by aggregatibacter ▴ 180