I have RNA-seq fastq files, each sample has multiple files from different lanes: A10_S4_R1.fastq.gz A10_S8_R1.fastq.gz A10_S40_R1.fastq.gz A10_S4_R2.fastq.gz A10_S8_R2.fastq.gz A10_S40_R2.fastq.gz
then A11 up A40
I am trying to merge using: cat A11*_R1.fastq.gz > A11_R1.fastq.gz This is fine, but I need a command to loop through folder and merge all R1 files for one sample then next sample, as well as R2 files
I have used answers from previous post, but none of them work
printf '%s\n' *.fastq.gz | sed 's/^\([^_]*_[^_]*\).*/\1/' | uniq |
while read prefix; do
cat "$prefix"*R1*.fastq.gz >"${prefix}_R1.fastq.gz"
cat "$prefix"*R2*.fastq.gz >"${prefix}_R2.fastq.gz"
done
for name in *.fastq.gz; do
printf '%s\n' "${name%_*_*_R[12]*}"
done | uniq |
for f in *.fastq.gz; do
[[ "$f" =~ ^([^_]+_[^_]+)_.*(_[^_]+)_[0-9]+\.fastq\.gz$ ]]
cat "$f" >> "${BASH_REMATCH[1]}${BASH_REMATCH[2]}.fastq.gz"
done
Can anyone advise, Thanks in adance
Examples you have posted don't seem to indicate so. Files for samples running in different lanes will have a
L00*
inclusion in the file name. There can at most be 8 lanes on Illumina FC so there is no chance of having 40 lanes (unless the sample ran across multiple FC but even thenL00*
number would be repeated across FC).The
S*
numbers you have are just row number for that particular sample in the samplesheet used for demultiplexing. They don't have any useful meaning.Disclaimer: Unless your sequencing facility is doing something non-standard.
Sorry for the confusion, the original files were as you said from different lanes, e.g. A14_S90_L008_R1_001.fastq.gz Files were merged using
Which resulted in the files A10_S4_R1.fastq.gz A10_S8_R1.fastq.gz A10_S40_R1.fastq.gz A10_S4_R2.fastq.gz A10_S8_R2.fastq.gz A10_S40_R2.fastq.gz etc
Thats were I get stuck, I cant merge these files
I see. So at this point you just need to focus on
A*
since thoseS*
are not useful.So would it be
Any help in how I would do this please?
If you can, ask the people making the fastqs to use the --no-lane-splitting option when making the fastqs.