Question: conactenate/merge fastq files
0
gravatar for jmirobla
4 weeks ago by
jmirobla0
jmirobla0 wrote:

I want would like to merge or concatenate reads from 2 fastq files in order to have the following output

file 1

@J00148:193:HFG7MBBXY:2:1101:2372:1279 2:N:0:GAAGGAAG+GACGTCAT
AANCTTCCC
+
-<#FFFFFA

file 2

@J00148:193:HFG7MBBXY:2:1101:2372:1279 1:N:0:GAAGGAAG+GACGTCAT
GTTGGGAAAGAATAGGTCTAGAATTTCTAGTTTACTACAGNTTGTTGCTATTTCGNTTNTTTTNTNANTTCGAGAC
+
AAAA7FJJJJ--FFFAJJJJJ<JJJJJJJJFJAFFFJJFJ#<-F---F-<-<JJ<#-<#AAJJ#J#A#JFFA-7A-

output

@J00148:193:HFG7MBBXY:2:1101:2372:1279 :N:0:GAAGGAAG+GACGTCAT
AANCTTCCCGTTGGGAAAGAATAGGTCTAGAATTTCTAGTTTACTACAGNTTGTTGCTATTTCGNTTNTTTTNTNANTTCGAGAC
+
-<#FFFFFAAAAA7FJJJJ--FFFAJJJJJ<JJJJJJJJFJAFFFJJFJ#<-F---F-<-<JJ<#-<#AAJJ#J#A#JFFA-7A-

Anyone can help with that?

Thanks

rna-seq alignment next-gen • 118 views
ADD COMMENTlink modified 4 weeks ago by ATpoint31k • written 4 weeks ago by jmirobla0

Technically possible, yes, may I ask though why you want to do this since there might influence the method how to do it.

ADD REPLYlink written 4 weeks ago by ATpoint31k

file1 is the UMIs fastq file separated from the file2 that is the actual read, I need to put them together again

ADD REPLYlink written 4 weeks ago by jmirobla0

I see, please see my answer below.

ADD REPLYlink written 4 weeks ago by ATpoint31k
0
gravatar for ATpoint
4 weeks ago by
ATpoint31k
Germany
ATpoint31k wrote:

Using only Unix tools:

paste -d "\t" \
  <(tr "\n" "\t" < file1.fq) \
  <(tr "\n" "\t" < file2.fq) \
  | awk 'FS="\t", OFS="\n" {gsub(" ","__");print $1, $2$6, $3, $4$8}' \
  | awk '{gsub("__", " ");gsub("[1-9]:N:0", ":N:0");print}' > merged.fq

First we linearize both files so one read (consisting of four lines) is written as a 4-column tab-separated file and pasted together with the second file, resulting in a 8-column file which we now can easily query with awk. awk then simply prints the first line of the read, then prints the merged read, then the +, then the merged quality. Eventually we collapse from tab-separated format back to newline-separated fastq format. Since there was a whitespace in the header which sometimes might mess up formatting I initially replaced this with a double-underscore as unique delimiter, and then eventually converted this back to whitespace.

$cat merged.fq 
@J00148:193:HFG7MBBXY:2:1101:2372:1279 :N:0:GAAGGAAG+GACGTCAT
AANCTTCCCGTTGGGAAAGAATAGGTCTAGAATTTCTAGTTTACTACAGNTTGTTGCTATTTCGNTTNTTTTNTNANTTCGAGAC
+
-<#FFFFFAAAAA7FJJJJ--FFFAJJJJJ<JJJJJJJJFJAFFFJJFJ#<-F---F-<-<JJ<#-<#AAJJ#J#A#JFFA-7A-
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by ATpoint31k

Many thanks! And this would also work with gzip compressed files just adding a gunzip/gzip command right?

ADD REPLYlink written 4 weeks ago by jmirobla0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 998 users visited in the last hour