Question

how to combine RNA seq data from 4 lanes

1

Entering edit mode

9.3 years ago

jolin0701-dy ▴ 100

I just got my RNA seq data.

CZ1_S6_L001_R1_001.fastq

CZ1_S6_L002_R1_001.fastq

CZ1_S6_L003_R1_001.fastq

CZ1_S6_L004_R1_001.fastq

I only know there are from 4 different lanes during sequencing.

What are they and do I need to combine these fastq files into one? And how?

Thanks~~~

RNA-Seq • 13k views

ADD COMMENT • link updated 6.8 years ago by paumarc ▴ 20 • written 9.3 years ago by jolin0701-dy ▴ 100

0

Entering edit mode

If I had paired-end data, I would combine all the forward read files together, and all the reverse read files together, for all lanes and all flow cells, am I right (assuming there are no batch effects due to lanes and flow cells)?

One concern I have is the index (barcode) sequence for the different flow cells are different. Will this affect my analysis? Thanks!

ADD REPLY • link 8.2 years ago by apuhegde ▴ 20

0

Entering edit mode

If it is the same sample (where multiple libraries were made using separate indexes) and then run on multiple lanes/FC's, you could (in theory), combine the R1 and R2 files. It would be much better to use read groups (http://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups ) to manage the aligned BAM files, keeping raw data separate.

For obvious reasons one shouldn't combine data from unrealted samples (since that would defeat the purpose of indexing them in the first place).

ADD REPLY • link 8.2 years ago by GenoMax 152k

0

Entering edit mode

Aling the four fastq separately and marge the bam files with samtools merge

ADD REPLY • link 6.8 years ago by paumarc ▴ 20

score 8 · Accepted Answer · 2016-03-30

8

Entering edit mode

9.3 years ago

GenoMax 152k

It appears to be the same sample run in 4 separate lanes. You can cat the files together into one or process them independently (giving you a way to parallelize). You can merge the 4 sample bam files into one (and then sort) at the end.

ADD COMMENT • link 9.3 years ago by GenoMax 152k

score 4 · Accepted Answer · 2016-03-30

You first might want to QC your data lane by lane to compare lane effects.

Based on the filenames I get the idea your data is single ended? In that case you can just concatenate the files for downstream analysis.

e.g.

cat CZ1_S6_L001_R1_001.fastq CZ1_S6_L002_R1_001.fastq CZ1_S6_L003_R1_001.fastq CZ1_S6_L004_R1_001.fastq > CZ1_S6_merged_R1_001.fastq