Question: how to combine RNA seq data from 4 lanes
0
gravatar for jolin0701-dy
2.6 years ago by
jolin0701-dy60
jolin0701-dy60 wrote:

I just got my RNA seq data.

CZ1_S6_L001_R1_001.fastq

CZ1_S6_L002_R1_001.fastq

CZ1_S6_L003_R1_001.fastq

CZ1_S6_L004_R1_001.fastq

I only know there are from 4 different lanes during sequencing.

What are they and do I need to combine these fastq files into one? And how?

Thanks~~~

rna-seq • 3.0k views
ADD COMMENTlink modified 11 days ago by paumarc10 • written 2.6 years ago by jolin0701-dy60

If I had paired-end data, I would combine all the forward read files together, and all the reverse read files together, for all lanes and all flow cells, am I right (assuming there are no batch effects due to lanes and flow cells)?

One concern I have is the index (barcode) sequence for the different flow cells are different. Will this affect my analysis? Thanks!

ADD REPLYlink written 18 months ago by apuhegde20

If it is the same sample (where multiple libraries were made using separate indexes) and then run on multiple lanes/FC's, you could (in theory), combine the R1 and R2 files. It would be much better to use read groups (http://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups ) to manage the aligned BAM files, keeping raw data separate.

For obvious reasons one shouldn't combine data from unrealted samples (since that would defeat the purpose of indexing them in the first place).

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax57k

Aling the four fastq separately and marge the bam files with samtools merge

ADD REPLYlink modified 11 days ago • written 11 days ago by paumarc10
4
gravatar for genomax
2.6 years ago by
genomax57k
United States
genomax57k wrote:

It appears to be the same sample run in 4 separate lanes. You can cat the files together into one or process them independently (giving you a way to parallelize). You can merge the 4 sample bam files into one (and then sort) at the end.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by genomax57k
3
gravatar for WouterDeCoster
2.6 years ago by
Belgium
WouterDeCoster32k wrote:

You first might want to QC your data lane by lane to compare lane effects.

Based on the filenames I get the idea your data is single ended? In that case you can just concatenate the files for downstream analysis.

e.g.

cat CZ1_S6_L001_R1_001.fastq CZ1_S6_L002_R1_001.fastq CZ1_S6_L003_R1_001.fastq CZ1_S6_L004_R1_001.fastq > CZ1_S6_merged_R1_001.fastq
ADD COMMENTlink modified 11 days ago • written 2.6 years ago by WouterDeCoster32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1800 users visited in the last hour