Question: Merge fastq reads for several samples
0
gravatar for rim.klabi
14 months ago by
rim.klabi0
rim.klabi0 wrote:

Hello

I have 400 fastq files from different samples in two sequencing runs. Both runs were on Illumina Hiseq. How i can merge the .fastq files of both runs for each sample, and in one step..For sure we have to keep R1 and R2 separate ..I know that we can just merge the .fastq files of both runs using cat..but i have to use this only for one sample…and than i have to repeat this many times for all the samples..and i have more than 100 samples... What command do i use ?? any folder to prepare?

Thank you for helping me

next-gen • 991 views
ADD COMMENTlink modified 14 months ago by swbarnes24.8k • written 14 months ago by rim.klabi0

You'll need a for loop for this, and to figure out how to write this command we need to know how your files are named, which naming pattern you use to distinguish the samples/lanes/read direction.

ADD REPLYlink written 14 months ago by WouterDeCoster36k

Why are you merging reads again?

ADD REPLYlink written 14 months ago by mforde841.2k

OP is not merging the reads but merging file pieces for a sample. bcl2fastq used to break files up in 2 million read chunks in past.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax62k

maybe im just being too literal here, but he seems to be saying that he's merging reads from two independent runs. and if thats the case, then he probably should be treating each run as a technical replicate, or consider merging after correcting for batch effects.

ADD REPLYlink modified 14 months ago • written 14 months ago by mforde841.2k

I have to merge reads from two runs in order to increase the reads number ..

ADD REPLYlink written 14 months ago by rim.klabi0

1, prepare two folders, and the same sample in two runs should share the same file name. And create an combined folder.
2, loop the folder of first run, for each file name you get, do 3
3, do cat run1/samplename.R1.fq run2/samplename.R1.fq > combined/samplename.R1.fq and cat run1/samplename.R2.fq run2/samplename.R2.fq > combined/samplename.R2.fq

ADD REPLYlink written 14 months ago by chen1.8k
0
gravatar for swbarnes2
14 months ago by
swbarnes24.8k
United States
swbarnes24.8k wrote:

This is a little perl script I use. It works off the assumption that everything is in the same folder, and everything before the 'S\d+' is the name. It will cat together everything with the same name

foreach my $sample (@dir) {
                        next unless $sample =~ /.gz/;
                my ($shortname) = $sample =~ /(\S+)_S\d+_L\d+_R\d_\d\d\d.fastq.gz/;
                $hash{$shortname}++;
}
foreach my $key(keys(%hash)) {
                mkdir $key;
                my $temp = $dir . "/" . "$key" . "*.gz";
                system("cd $key;  cat $temp | STAR --genomeDir $genomeDir --sjdbGTFfile $gtf --readFilesIn - --readFilesCommand zcat  --quantMode TranscriptomeSAM GeneCounts --outSAMunmapped Within  --outSAMtype BAM SortedByCoordinate -- runThreadN 20 --limitBAMsortRAM 1001279989; cd ..;");
}
ADD COMMENTlink modified 14 months ago • written 14 months ago by swbarnes24.8k

Since OP has not said anything about alignment it may be good to remove the STAR command line from the loop above.

ADD REPLYlink written 14 months ago by genomax62k

True, but he can plug in whatever applications he intends to do in its place.

ADD REPLYlink written 14 months ago by swbarnes24.8k

Thank you i will try ..

ADD REPLYlink written 14 months ago by rim.klabi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1412 users visited in the last hour