Question: Why is my fastq data on two files if I am supposed to have single end reads?
4.9 years ago by
grokaine20 wrote:

Weirdly enough, my single end reads are on two separate fastq files. Do you know why this happens? Can I rely on them or must I make a single file for mapping with tophat2? I saw that paired end reads must remain on separate files, but what about the single end?

rna-seq tophat2 • 1.4k views
What are the names of the files? What is the output of:

head -n1 FILE1.FASTQ

head -n1 FILE2.FASTQ

edit: Illumina software may also split big files automatically, such as your files will be named:

You can probably just post the file names and we'll know what's going on. We commonly split samples over multiple lanes. By default, the software the deals with multiplexed experiments won't combine samples across lanes, so that's a common cause of things like this.

4.9 years ago by
James Ashmore2.9k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.9k wrote:

Check with the sequencing centre, they may have run your sample in two different lanes or on different days. For mapping I would suggest keeping them separate for now if this is the case, you want to be able to determine if there are any batch-effects.

4.9 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

Even if you do have paired end data, it would pose no problem to pass these files both as fragment reads to an aligner. If you do have paired reads, you might see that the end of the read names in each file are appended with "/1 or /2" or a space and then "1:... or 2:...".

