Question: multiple paired-end files for one sample?
0
gravatar for manekineko
3.7 years ago by
manekineko130
Bulgaria
manekineko130 wrote:

I have multiple files in paired end fastq format for RNA-seq for one sample, like:

01_R1, 
02_R1, 
03_R1 (up to 09 with size ~1G)..., and 

01_R2, 
02_R2, 
03_R2 (up to 09)

should I join by simple cat command all R1 files and than all R2 files before proceeding to trimming adapters?

 

rna-seq • 2.5k views
ADD COMMENTlink modified 3.7 years ago by genomax65k • written 3.7 years ago by manekineko130
0
gravatar for pristanna
3.7 years ago by
pristanna520
Czech Republic
pristanna520 wrote:

If the reads within the files are NOT of the same length, and you want to do absolute length based trimming, you should trim the files separately and then join them. If the reads are of the same length, the result should be the same in both scenarios.

Maybe you will find useful another thread about merging the fastq files: Fastq Files From Different Flowcells

ADD COMMENTlink written 3.7 years ago by pristanna520

They all are from one lane. I do not know why they are multiple files. About the lenght they are 101nt paired end (I'm not sure if they are same lenght or different after trimming).

ADD REPLYlink written 3.7 years ago by manekineko130
1

They are in multiple files because that is default for Illumina's basecalling/demultiplexing (CASAVA/bcl2fastq) software. Your sequence provider must not have used an override switch (--fastq-cluster-count 0) to put all R1 (and R2) sequences into single files. Original (un-trimmed) sequences should all be identical length (101 bp in your case).

ADD REPLYlink written 3.7 years ago by genomax65k

So I can just cat all R1, than cat all R1 as they are now raw-originals, without any worries, and continue with the trimming of the 2 (R1 and R2) files?

ADD REPLYlink written 3.7 years ago by manekineko130
1

Yes, if they are not compressed, simply cat all R1 into one file and all R2 into another one. If the files are compressed, read the link from genomax2. (Or just make sure that the sum of lines in all R1 (and R2) files is equal to the number of lines in the multiple R1 and R2 files).

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by pristanna520
0
gravatar for genomax
3.7 years ago by
genomax65k
United States
genomax65k wrote:

Yes but consider the caveat explained in this post: http://seqanswers.com/forums/showthread.php?t=51395

ADD COMMENTlink written 3.7 years ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour