Why is my fastq data on two files if I am supposed to have single end reads?
2
0
Entering edit mode
8.6 years ago
grokaine ▴ 40

Weirdly enough, my single end reads are on two separate fastq files. Do you know why this happens? Can I rely on them or must I make a single file for mapping with tophat2? I saw that paired end reads must remain on separate files, but what about the single end?

RNA-Seq tophat2 • 3.3k views
ADD COMMENT
0
Entering edit mode

What are the names of the files? What is the output of:

head -n1 FILE1.FASTQ

head -n1 FILE2.FASTQ

edit: Illumina software may also split big files automatically, such as your files will be named:

NAME_TTAGGC_L004_R1_001.fastq.gz
NAME_TTAGGC_L004_R1_002.fastq.gz
NAME_TTAGGC_L004_R1_003.fastq.gz
ADD REPLY
0
Entering edit mode

You can probably just post the file names and we'll know what's going on. We commonly split samples over multiple lanes. By default, the software the deals with multiplexed experiments won't combine samples across lanes, so that's a common cause of things like this.

ADD REPLY
2
Entering edit mode
8.6 years ago
James Ashmore ★ 3.4k

Check with the sequencing centre, they may have run your sample in two different lanes or on different days. For mapping I would suggest keeping them separate for now if this is the case, you want to be able to determine if there are any batch-effects.

ADD COMMENT
0
Entering edit mode
8.6 years ago

Even if you do have paired end data, it would pose no problem to pass these files both as fragment reads to an aligner. If you do have paired reads, you might see that the end of the read names in each file are appended with "/1 or /2" or a space and then "1:... or 2:...".

ADD COMMENT

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6