Question

Unequal paired-end fastq size after quality control

0

Entering edit mode

7.7 years ago

Whoknows ▴ 960

Hi friends

I have used Trimmomatic for checking quality of my RNA-SEQ paired-end files. I have got an odd output, the final result showed different size for fastq file=> L1= 9275244535 and L2= 9238052265

Why this happnen?

I used this code :

java -jar trimmomatic-0.36.jar PE L1.fq.gz L2.fq.gz paired_L1.fastq unpaired_L1.fastq paired_L2.fastq unpaired_L2.fastq LEADING:20 TRAILING:20 MINLEN:140

I did not trim first bases, but first 12 bases showes unbalanced in fastq file and also duplicatation on first 12 bases region.

RNA-Seq fastq Trimmomatic • 4.6k views

ADD COMMENT • link updated 7.7 years ago by mastal511 ★ 2.1k • written 7.7 years ago by Whoknows ▴ 960

1

Entering edit mode

Could you clarify what f1 and f2 are?

ADD REPLY • link 7.7 years ago by WouterDeCoster 47k

0

Entering edit mode

f1 or L1 size in byte, f2 or L2 size in byte. I have updated that.

ADD REPLY • link 7.7 years ago by Whoknows ▴ 960

1

Entering edit mode

I don't see the relevance of the size in bytes, number of lines would be more informative (wc -l yourfile.fastq)

That said, it's very well possible that one read of a pair didn't 'survive' the trimming and the read became 'unpaired'. Edit: which should then end up in different files, thanks to @mastal511 for pointing this out

ADD REPLY • link 7.7 years ago by WouterDeCoster 47k

0

Entering edit mode

The lines number for both files were same, both files have 102449352 lines.

ADD REPLY • link 7.7 years ago by Whoknows ▴ 960

1

Entering edit mode

Sounds like nothing to worry about then :p

ADD REPLY • link 7.7 years ago by WouterDeCoster 47k

score 2 · Answer 1 · 2016-08-27

2

Entering edit mode

7.7 years ago

reza ▴ 300

size of both files is same but their sequencing quality is different and after trimming them, size of them will not same because trimmed line and bases will be different.

ADD COMMENT • link 7.7 years ago by reza ▴ 300

0

Entering edit mode

Thanks Reza, your are right. It removes low quality paired reads but still remains reads even with different number of bases in each side.

ADD REPLY • link 7.7 years ago by Whoknows ▴ 960

score 1 · Answer 2 · 2016-08-27

1

Entering edit mode

7.7 years ago

mastal511 ★ 2.1k

If one read of a pair doesn't survive the trimming, trimmomatic will put the surviving mate in one of the unpaired.fastq files. So the two paired.fastq files should have the same number of lines and the same number of reads, but after trimming, not necessarily the same number of bases. The strange per base sequence content you see at the 5' ends is quite common for RNA-Seq data, and is due to the random priming step in the library prep not being quite so random.

ADD COMMENT • link 7.7 years ago by mastal511 ★ 2.1k

0

Entering edit mode

So, you mean trimmomatic may allow to have both read pair after trimming but with different number of bases on each sides, right?

ADD REPLY • link 7.7 years ago by Whoknows ▴ 960