Question

How to handle 4 fastq file for one paired-end sample?

1

Entering edit mode

3.8 years ago

maisarasora ▴ 20

Hi everyone,

I am new to the NGS. For training, I used paired-end data from public files, and usually, there were only two fastq files for the two reads.

Now, I receive the fastq files for my ChIP-seq studies. We did 50 bp paired-end. And for one PE file, there are four fastq files. For example:

R1F=IN1_S1_L001_R1_001.fastq R1R=IN1_S1_L001_R2_001.fastq R2F=IN1_S17_L002_R1_001.fastq R2R=IN1_S17_L002_R2_001.fastq

I am now a bit confused about how to handle these 4 data.

I tried to do bowtie2 paired-end mapping for R1F as -1 and R1R as -2 but it seems not working.

Bowtie2 notification: Note that if <mates> files are specified using -1/-2, a <singles> file cannot also be specified. Please run bowtie separately for mates and singles. Error: Encountered internal Bowtie 2 exception (#1)

I am also unsure about the use of Paired-end merge tools.

Please help. Thank you

ChIP-Seq alignment • 4.5k views

ADD COMMENT • link updated 3.8 years ago by jordi.planells ▴ 480 • written 3.8 years ago by maisarasora ▴ 20

0

Entering edit mode

If you want people to help, show your command lines.

ADD REPLY • link 3.8 years ago by swbarnes2 14k

0

Entering edit mode

bowtie2 -p 8 -N 1 --no-mixed mode --no-discordant -x $REF -1 IN1_S1_L001_R1_001.fastq -2 IN1_S17_L002_R1_001.fastq -S /mnt/e/sequence_data/ChIP-seq/FASTQ/14INPUT/bams/IN1R1.sam

this is the command line that I used

ADD REPLY • link 3.8 years ago by maisarasora ▴ 20

1

Entering edit mode

IN1_S1_L001_R1_001.fastq, L001 and L002 means different lanes, R1 and R2 means read1 and matched read2. so you should use R1F=IN1_S1_L001_R1_001.fastq, R1R=IN1_S1_L001_R2_001.fastq as input 1 and 2, or merge the different lane data into a one data, then use bowtie2.

ADD REPLY • link 3.8 years ago by zhuobaowen ▴ 40

0

Entering edit mode

Thank you... I also tried that approach.. At first I don't clearly understand about the different lane and read because there are only 2 fastq files in the public data... As suggested below, I'm now merging the two files from the same read using concatenate, I did the bowtie2 for the two reads and I get 99.24% alignment...

Do you have any opinion about the use of Merge Tool like Flash & Pandaseq to merge the data?

ADD REPLY • link 3.8 years ago by maisarasora ▴ 20

0

Entering edit mode

sorry, I just use cat command to add the fastq data, or use samtools merge to merge bam data

ADD REPLY • link 3.8 years ago by zhuobaowen ▴ 40

0

Entering edit mode

In public data there are only 2 files (R1 and R2) because the concatenation (the cat command) of the different lanes has already been performed. Recap:
To merge the different lanes, use cat
To merge different bam files (for example several replicates) use samtools merge

ADD REPLY • link 3.8 years ago by jordi.planells ▴ 480

0

Entering edit mode

bowtie2 -p 8 -N 1 --no-mixed mode --no-discordant -x $REF -1 IN1_S1_L001_R1_001.fastq -2 IN1_S17_L002_R1_001.fastq -S /mnt/e/sequence_data/ChIP-seq/FASTQ/14INPUT/bams/IN1R1.sam

this is the command line that I used

ADD REPLY • link 3.8 years ago by maisarasora ▴ 20

0

Entering edit mode

Why are you giving it two read 1 files as if they were matched pairs? That's never going to work.

ADD REPLY • link 3.8 years ago by swbarnes2 14k

0

Entering edit mode

I just realize that... I don't clearly understand the different lane and read because there are only 2 fastq files in the public data... I'm using concatenate to merge the data from the same read

ADD REPLY • link 3.8 years ago by maisarasora ▴ 20

score 7 · Accepted Answer · 2020-07-14

7

Entering edit mode

3.8 years ago

jordi.planells ▴ 480

Hi! I think what you have there is 2 fastq files for the same sample, each one coming from a different sequencing lane (note that some of the files have L001 and some others L002). What you need to do is to concatenate the sequencing lanes into one file and then align with the resulting file.

cat IN1_S1_L001_R1_001.fastq IN1_S1_L002_R1_001.fastq > IN1_S1_concatenated_R1_001.fastq
cat IN1_S1_L001_R2_001.fastq IN1_S1_L002_R2_001.fastq > IN1_S1_concatenated_R2_001.fastq

The resulting files are the 2 mates (R1 and R2) that you want to feed into your favorite aligner.

Hope it helps!

ADD COMMENT • link 3.8 years ago by jordi.planells ▴ 480

0

Entering edit mode

Thank you so much!... At first, I don't clearly understand the different lane and read because there are only 2 fastq files in the public data... As suggested, I merged the two files from the same read using concatenate, I did the bowtie2 for the two reads and I get 99.24% alignment...

Do you have any opinion about the use of Merge Tool like Flash & Pandaseq to merge the data? There are some comments that there will be some data loss if you just concatenate to merge the data...