Question: Merge fastq reads for several samples
0
gravatar for rim.klabi
2.8 years ago by
rim.klabi0
rim.klabi0 wrote:

Hello

I have 400 fastq files from different samples in two sequencing runs. Both runs were on Illumina Hiseq. How i can merge the .fastq files of both runs for each sample, and in one step..For sure we have to keep R1 and R2 separate ..I know that we can just merge the .fastq files of both runs using cat..but i have to use this only for one sample…and than i have to repeat this many times for all the samples..and i have more than 100 samples... What command do i use ?? any folder to prepare?

Thank you for helping me

next-gen • 2.0k views
ADD COMMENTlink modified 2.8 years ago by swbarnes28.6k • written 2.8 years ago by rim.klabi0

You'll need a for loop for this, and to figure out how to write this command we need to know how your files are named, which naming pattern you use to distinguish the samples/lanes/read direction.

ADD REPLYlink written 2.8 years ago by WouterDeCoster44k

Why are you merging reads again?

ADD REPLYlink written 2.8 years ago by mforde841.2k

OP is not merging the reads but merging file pieces for a sample. bcl2fastq used to break files up in 2 million read chunks in past.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax90k

maybe im just being too literal here, but he seems to be saying that he's merging reads from two independent runs. and if thats the case, then he probably should be treating each run as a technical replicate, or consider merging after correcting for batch effects.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by mforde841.2k

I have to merge reads from two runs in order to increase the reads number ..

ADD REPLYlink written 2.8 years ago by rim.klabi0

1, prepare two folders, and the same sample in two runs should share the same file name. And create an combined folder.
2, loop the folder of first run, for each file name you get, do 3
3, do cat run1/samplename.R1.fq run2/samplename.R1.fq > combined/samplename.R1.fq and cat run1/samplename.R2.fq run2/samplename.R2.fq > combined/samplename.R2.fq

ADD REPLYlink written 2.8 years ago by chen2.1k
0
gravatar for swbarnes2
2.8 years ago by
swbarnes28.6k
United States
swbarnes28.6k wrote:

This is a little perl script I use. It works off the assumption that everything is in the same folder, and everything before the 'S\d+' is the name. It will cat together everything with the same name

foreach my $sample (@dir) {
                        next unless $sample =~ /.gz/;
                my ($shortname) = $sample =~ /(\S+)_S\d+_L\d+_R\d_\d\d\d.fastq.gz/;
                $hash{$shortname}++;
}
foreach my $key(keys(%hash)) {
                mkdir $key;
                my $temp = $dir . "/" . "$key" . "*.gz";
                system("cd $key;  cat $temp | STAR --genomeDir $genomeDir --sjdbGTFfile $gtf --readFilesIn - --readFilesCommand zcat  --quantMode TranscriptomeSAM GeneCounts --outSAMunmapped Within  --outSAMtype BAM SortedByCoordinate -- runThreadN 20 --limitBAMsortRAM 1001279989; cd ..;");
}
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by swbarnes28.6k

Since OP has not said anything about alignment it may be good to remove the STAR command line from the loop above.

ADD REPLYlink written 2.8 years ago by genomax90k

True, but he can plug in whatever applications he intends to do in its place.

ADD REPLYlink written 2.8 years ago by swbarnes28.6k

Thank you i will try ..

ADD REPLYlink written 2.8 years ago by rim.klabi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1054 users visited in the last hour