Question: How to handle 4 fastq file for one paired-end sample?
1
gravatar for maisarasora
3 months ago by
maisarasora10
USM, Malaysia
maisarasora10 wrote:

Hi everyone,

I am new to the NGS. For training, I used paired-end data from public files, and usually, there were only two fastq files for the two reads.

Now, I receive the fastq files for my ChIP-seq studies. We did 50 bp paired-end. And for one PE file, there are four fastq files. For example:

R1F=IN1_S1_L001_R1_001.fastq R1R=IN1_S1_L001_R2_001.fastq R2F=IN1_S17_L002_R1_001.fastq R2R=IN1_S17_L002_R2_001.fastq

I am now a bit confused about how to handle these 4 data.

I tried to do bowtie2 paired-end mapping for R1F as -1 and R1R as -2 but it seems not working.

Bowtie2 notification: Note that if <mates> files are specified using -1/-2, a <singles> file cannot also be specified. Please run bowtie separately for mates and singles. Error: Encountered internal Bowtie 2 exception (#1)

I am also unsure about the use of Paired-end merge tools.

Please help. Thank you

chip-seq alignment • 237 views
ADD COMMENTlink modified 3 months ago by jordi.planells230 • written 3 months ago by maisarasora10

If you want people to help, show your command lines.

ADD REPLYlink written 3 months ago by swbarnes28.9k

bowtie2 -p 8 -N 1 --no-mixed mode --no-discordant -x $REF -1 IN1_S1_L001_R1_001.fastq -2 IN1_S17_L002_R1_001.fastq -S /mnt/e/sequence_data/ChIP-seq/FASTQ/14INPUT/bams/IN1R1.sam

this is the command line that I used

ADD REPLYlink written 3 months ago by maisarasora10
1

IN1_S1_L001_R1_001.fastq, L001 and L002 means different lanes, R1 and R2 means read1 and matched read2. so you should use R1F=IN1_S1_L001_R1_001.fastq, R1R=IN1_S1_L001_R2_001.fastq as input 1 and 2, or merge the different lane data into a one data, then use bowtie2.

ADD REPLYlink written 3 months ago by zhuobaowen10

Thank you... I also tried that approach.. At first I don't clearly understand about the different lane and read because there are only 2 fastq files in the public data... As suggested below, I'm now merging the two files from the same read using concatenate, I did the bowtie2 for the two reads and I get 99.24% alignment...

Do you have any opinion about the use of Merge Tool like Flash & Pandaseq to merge the data?

ADD REPLYlink written 3 months ago by maisarasora10

sorry, I just use cat command to add the fastq data, or use samtools merge to merge bam data

ADD REPLYlink written 3 months ago by zhuobaowen10

In public data there are only 2 files (R1 and R2) because the concatenation (the cat command) of the different lanes has already been performed. Recap:
To merge the different lanes, use cat
To merge different bam files (for example several replicates) use samtools merge

ADD REPLYlink modified 3 months ago • written 3 months ago by jordi.planells230

bowtie2 -p 8 -N 1 --no-mixed mode --no-discordant -x $REF -1 IN1_S1_L001_R1_001.fastq -2 IN1_S17_L002_R1_001.fastq -S /mnt/e/sequence_data/ChIP-seq/FASTQ/14INPUT/bams/IN1R1.sam

this is the command line that I used

ADD REPLYlink modified 3 months ago • written 3 months ago by maisarasora10

Why are you giving it two read 1 files as if they were matched pairs? That's never going to work.

ADD REPLYlink written 3 months ago by swbarnes28.9k

I just realize that... I don't clearly understand the different lane and read because there are only 2 fastq files in the public data... I'm using concatenate to merge the data from the same read

ADD REPLYlink written 3 months ago by maisarasora10
3
gravatar for jordi.planells
3 months ago by
jordi.planells230 wrote:

Hi! I think what you have there is 2 fastq files for the same sample, each one coming from a different sequencing lane (note that some of the files have L001 and some others L002). What you need to do is to concatenate the sequencing lanes into one file and then align with the resulting file.

cat IN1_S1_L001_R1_001.fastq IN1_S1_L002_R1_001.fastq > IN1_S1_concatenated_R1_001.fastq
cat IN1_S1_L001_R2_001.fastq IN1_S1_L002_R2_001.fastq > IN1_S1_concatenated_R2_001.fastq

The resulting files are the 2 mates (R1 and R2) that you want to feed into your favorite aligner.

Hope it helps!

ADD COMMENTlink written 3 months ago by jordi.planells230

Thank you so much!... At first, I don't clearly understand the different lane and read because there are only 2 fastq files in the public data... As suggested, I merged the two files from the same read using concatenate, I did the bowtie2 for the two reads and I get 99.24% alignment...

Do you have any opinion about the use of Merge Tool like Flash & Pandaseq to merge the data? There are some comments that there will be some data loss if you just concatenate to merge the data...

But seeing the good % of alignment, I think "concatenate" works well

ADD REPLYlink written 3 months ago by maisarasora10
1

You can safely concatenate the fastqs, or merge bams with no data loss.

ADD REPLYlink written 3 months ago by swbarnes28.9k

Thank you for your reply...

ADD REPLYlink written 3 months ago by maisarasora10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1072 users visited in the last hour