Question

Why is my fastq data on two files if I am supposed to have single end reads?

0

Entering edit mode

8.6 years ago

grokaine ▴ 40

Weirdly enough, my single end reads are on two separate fastq files. Do you know why this happens? Can I rely on them or must I make a single file for mapping with tophat2? I saw that paired end reads must remain on separate files, but what about the single end?

RNA-Seq tophat2 • 3.3k views

ADD COMMENT • link updated 8.6 years ago by Matt Shirley 10k • written 8.6 years ago by grokaine ▴ 40

0

Entering edit mode

What are the names of the files? What is the output of:

head -n1 FILE1.FASTQ

head -n1 FILE2.FASTQ

edit: Illumina software may also split big files automatically, such as your files will be named:

NAME_TTAGGC_L004_R1_001.fastq.gz
NAME_TTAGGC_L004_R1_002.fastq.gz
NAME_TTAGGC_L004_R1_003.fastq.gz

ADD REPLY • link 8.6 years ago by h.mon 35k

0

Entering edit mode

You can probably just post the file names and we'll know what's going on. We commonly split samples over multiple lanes. By default, the software the deals with multiplexed experiments won't combine samples across lanes, so that's a common cause of things like this.

ADD REPLY • link 8.6 years ago by Devon Ryan 104k

score 2 · Answer 1 · 2015-09-04

2

Entering edit mode

8.6 years ago

James Ashmore ★ 3.4k

Check with the sequencing centre, they may have run your sample in two different lanes or on different days. For mapping I would suggest keeping them separate for now if this is the case, you want to be able to determine if there are any batch-effects.

ADD COMMENT • link 8.6 years ago by James Ashmore ★ 3.4k

score 0 · Answer 2 · 2015-09-04

0

Entering edit mode

8.6 years ago

Matt Shirley 10k

Even if you do have paired end data, it would pose no problem to pass these files both as fragment reads to an aligner. If you do have paired reads, you might see that the end of the read names in each file are appended with "/1 or /2" or a space and then "1:... or 2:...".

ADD COMMENT • link 8.6 years ago by Matt Shirley 10k