How to handle 4 fastq file for one paired-end sample?
1
1
Entering edit mode
3.8 years ago
maisarasora ▴ 20

Hi everyone,

I am new to the NGS. For training, I used paired-end data from public files, and usually, there were only two fastq files for the two reads.

Now, I receive the fastq files for my ChIP-seq studies. We did 50 bp paired-end. And for one PE file, there are four fastq files. For example:

R1F=IN1_S1_L001_R1_001.fastq R1R=IN1_S1_L001_R2_001.fastq R2F=IN1_S17_L002_R1_001.fastq R2R=IN1_S17_L002_R2_001.fastq

I am now a bit confused about how to handle these 4 data.

I tried to do bowtie2 paired-end mapping for R1F as -1 and R1R as -2 but it seems not working.

Bowtie2 notification: Note that if <mates> files are specified using -1/-2, a <singles> file cannot also be specified. Please run bowtie separately for mates and singles. Error: Encountered internal Bowtie 2 exception (#1)

I am also unsure about the use of Paired-end merge tools.

Please help. Thank you

ChIP-Seq alignment • 4.5k views
ADD COMMENT
0
Entering edit mode

If you want people to help, show your command lines.

ADD REPLY
0
Entering edit mode

bowtie2 -p 8 -N 1 --no-mixed mode --no-discordant -x $REF -1 IN1_S1_L001_R1_001.fastq -2 IN1_S17_L002_R1_001.fastq -S /mnt/e/sequence_data/ChIP-seq/FASTQ/14INPUT/bams/IN1R1.sam

this is the command line that I used

ADD REPLY
1
Entering edit mode

IN1_S1_L001_R1_001.fastq, L001 and L002 means different lanes, R1 and R2 means read1 and matched read2. so you should use R1F=IN1_S1_L001_R1_001.fastq, R1R=IN1_S1_L001_R2_001.fastq as input 1 and 2, or merge the different lane data into a one data, then use bowtie2.

ADD REPLY
0
Entering edit mode

Thank you... I also tried that approach.. At first I don't clearly understand about the different lane and read because there are only 2 fastq files in the public data... As suggested below, I'm now merging the two files from the same read using concatenate, I did the bowtie2 for the two reads and I get 99.24% alignment...

Do you have any opinion about the use of Merge Tool like Flash & Pandaseq to merge the data?

ADD REPLY
0
Entering edit mode

sorry, I just use cat command to add the fastq data, or use samtools merge to merge bam data

ADD REPLY
0
Entering edit mode

In public data there are only 2 files (R1 and R2) because the concatenation (the cat command) of the different lanes has already been performed. Recap:
To merge the different lanes, use cat
To merge different bam files (for example several replicates) use samtools merge

ADD REPLY
0
Entering edit mode

bowtie2 -p 8 -N 1 --no-mixed mode --no-discordant -x $REF -1 IN1_S1_L001_R1_001.fastq -2 IN1_S17_L002_R1_001.fastq -S /mnt/e/sequence_data/ChIP-seq/FASTQ/14INPUT/bams/IN1R1.sam

this is the command line that I used

ADD REPLY
0
Entering edit mode

Why are you giving it two read 1 files as if they were matched pairs? That's never going to work.

ADD REPLY
0
Entering edit mode

I just realize that... I don't clearly understand the different lane and read because there are only 2 fastq files in the public data... I'm using concatenate to merge the data from the same read

ADD REPLY
7
Entering edit mode
3.8 years ago

Hi! I think what you have there is 2 fastq files for the same sample, each one coming from a different sequencing lane (note that some of the files have L001 and some others L002). What you need to do is to concatenate the sequencing lanes into one file and then align with the resulting file.

cat IN1_S1_L001_R1_001.fastq IN1_S1_L002_R1_001.fastq > IN1_S1_concatenated_R1_001.fastq
cat IN1_S1_L001_R2_001.fastq IN1_S1_L002_R2_001.fastq > IN1_S1_concatenated_R2_001.fastq

The resulting files are the 2 mates (R1 and R2) that you want to feed into your favorite aligner.

Hope it helps!

ADD COMMENT
0
Entering edit mode

Thank you so much!... At first, I don't clearly understand the different lane and read because there are only 2 fastq files in the public data... As suggested, I merged the two files from the same read using concatenate, I did the bowtie2 for the two reads and I get 99.24% alignment...

Do you have any opinion about the use of Merge Tool like Flash & Pandaseq to merge the data? There are some comments that there will be some data loss if you just concatenate to merge the data...

But seeing the good % of alignment, I think "concatenate" works well

ADD REPLY
1
Entering edit mode

You can safely concatenate the fastqs, or merge bams with no data loss.

ADD REPLY
0
Entering edit mode

Thank you for your reply...

ADD REPLY

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6