Question: How to treat the unpaired data after trimmomatic
0
gravatar for liu.huand
3.5 years ago by
liu.huand0
liu.huand0 wrote:

Hi, I'm totally new here and totally new to bioinformatic (I think I technically started learning this last week) Story goes like this, I got my atac-seq fastq data last Monday, and I started to turn these fastq files into peaks. I learnt how to trim, align and visualize my data and a little bit QC afterward. I chose Trimmomatic in the galaxy of my university (Illuminaclip Nextera pair end adapter) to trim my fastq file, and got 4 files, 2 paireds, and 2 unpaireds. I only aligned my paired fastq files with Bowtie2 -X 2000, and got mapped rate as 90%. I converted the BAM files (default output of the bowtie2 in our galaxy) into bedgraph then tdf in IGV for visualization. I plotted the distribution of the reads surrounding the TSS of my annotated genome and got highly enrichment near TSS. OK, weird thing happened. I plotted the insert distribution using picard tool, and got this plot: enter image description here

It appeared that I lost all the inserts smaller than 120 bp which is actually the nucleosome-free-regions that I need most. THen I guessed I must have some data that were not mapped, so I went back to my fastq file, and found those unpaired data generated from Trimmomatic are huge. For example, each of the paired file is 8 GB, and one unpaired R1 file is 4.5GB, the other is 10mb. I wonder whether my nucleosom-free-regions just lied in these unpaired data, and how I can combine this unpaired data with my paired data generated from Trimmomatic?

Thanks, Huan

sequence forum alignment • 3.7k views
ADD COMMENTlink modified 3.5 years ago by Mike1.4k • written 3.5 years ago by liu.huand0
2

Map unpaired reads as single-end data. You should have a look at what sequences get trimmed off – low quality, 3'-end adapter or something else? To investigate the issue, an alternative is to map untrimmed reads with a local mapper such as bowtie2 --local or bwa-mem. These mappers won't map adapter sequences. Sometimes this may be easier when you are not sure what trimmomatic is doing to your data.

ADD REPLYlink written 3.5 years ago by lh331k

Sorry, not sure how to upload the figure. You may find figure through this "http://postimg.org/image/m1fmud4u9/"

ADD REPLYlink written 3.5 years ago by liu.huand0

Well congratulations on your first week, looks like you already learned a lot. Good luck!

ADD REPLYlink written 3.5 years ago by WouterDeCoster42k
2
gravatar for Istvan Albert
3.5 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

The insert size refers to the size of the DNA fragment that is sequenced. Instruments can only sequence fragments over a certain size.

ADD COMMENTlink written 3.5 years ago by Istvan Albert ♦♦ 81k

Hi, Istvan, Thanks for the reply. So, you mean, it's possible that my instrument only sequenced the inserts with size over 120bp, and discard fragment less than that? I was using Hiseq2500 and 125bp PE, and I will figure out the minimum size they sequenced.. Thanks!

ADD REPLYlink written 3.5 years ago by liu.huand0

I can't help but suspect that you are misusing the insert size concept and you are interpreting it it as something else that you are interested in measuring. I would suggest to post a new question in which you disentangle your question from trimmomatic and adapter clipping etc. These only confuse the issue and are not related to what you need.

ADD REPLYlink written 3.5 years ago by Istvan Albert ♦♦ 81k

Hi, Istvan, I solved this. I did misunderstood fragment size and insert size for paired end in the very beginning. But this figure I posted is truly the distribution of insert size. I spent a whole day trying different trimmomatic parameter to trim my fastq file and bowtie2 mapping. I found if make <keepbothread> true, which will not dump the reverse sequence, I got tons of insert size smaller than 100 which is what I need for my experiment.

ADD REPLYlink written 3.5 years ago by liu.huand0

The smallest fragment is a primer dimer (no insert). We know those cluster (and get sequenced) well.

ADD REPLYlink written 3.5 years ago by genomax74k
0
gravatar for Mike
3.5 years ago by
Mike1.4k
UK
Mike1.4k wrote:

Hi , Insert size and fragment size are always confusing, It would be great help if anybody explain these terms. Thanks,

ADD COMMENTlink written 3.5 years ago by Mike1.4k

This is a good discussion about it: Fragment Size: TLEN vs. isize

I think the take-home message is "they're the same unless they definitely aren't" :)

ADD REPLYlink written 3.5 years ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1129 users visited in the last hour