conactenate/merge fastq files
1
0
Entering edit mode
4.2 years ago
jmirobla • 0

I want would like to merge or concatenate reads from 2 fastq files in order to have the following output

file 1

@J00148:193:HFG7MBBXY:2:1101:2372:1279 2:N:0:GAAGGAAG+GACGTCAT
AANCTTCCC
+
-<#FFFFFA

file 2

@J00148:193:HFG7MBBXY:2:1101:2372:1279 1:N:0:GAAGGAAG+GACGTCAT
GTTGGGAAAGAATAGGTCTAGAATTTCTAGTTTACTACAGNTTGTTGCTATTTCGNTTNTTTTNTNANTTCGAGAC
+
AAAA7FJJJJ--FFFAJJJJJ<JJJJJJJJFJAFFFJJFJ#<-F---F-<-<JJ<#-<#AAJJ#J#A#JFFA-7A-

output

@J00148:193:HFG7MBBXY:2:1101:2372:1279 :N:0:GAAGGAAG+GACGTCAT
AANCTTCCCGTTGGGAAAGAATAGGTCTAGAATTTCTAGTTTACTACAGNTTGTTGCTATTTCGNTTNTTTTNTNANTTCGAGAC
+
-<#FFFFFAAAAA7FJJJJ--FFFAJJJJJ<JJJJJJJJFJAFFFJJFJ#<-F---F-<-<JJ<#-<#AAJJ#J#A#JFFA-7A-

Anyone can help with that?

Thanks

rna-seq next-gen alignment • 646 views
ADD COMMENT
0
Entering edit mode

Technically possible, yes, may I ask though why you want to do this since there might influence the method how to do it.

ADD REPLY
0
Entering edit mode

file1 is the UMIs fastq file separated from the file2 that is the actual read, I need to put them together again

ADD REPLY
0
Entering edit mode

I see, please see my answer below.

ADD REPLY
0
Entering edit mode
4.2 years ago
ATpoint 82k

Using only Unix tools:

paste -d "\t" \
  <(tr "\n" "\t" < file1.fq) \
  <(tr "\n" "\t" < file2.fq) \
  | awk 'FS="\t", OFS="\n" {gsub(" ","__");print $1, $2$6, $3, $4$8}' \
  | awk '{gsub("__", " ");gsub("[1-9]:N:0", ":N:0");print}' > merged.fq

First we linearize both files so one read (consisting of four lines) is written as a 4-column tab-separated file and pasted together with the second file, resulting in a 8-column file which we now can easily query with awk. awk then simply prints the first line of the read, then prints the merged read, then the +, then the merged quality. Eventually we collapse from tab-separated format back to newline-separated fastq format. Since there was a whitespace in the header which sometimes might mess up formatting I initially replaced this with a double-underscore as unique delimiter, and then eventually converted this back to whitespace.

$cat merged.fq 
@J00148:193:HFG7MBBXY:2:1101:2372:1279 :N:0:GAAGGAAG+GACGTCAT
AANCTTCCCGTTGGGAAAGAATAGGTCTAGAATTTCTAGTTTACTACAGNTTGTTGCTATTTCGNTTNTTTTNTNANTTCGAGAC
+
-<#FFFFFAAAAA7FJJJJ--FFFAJJJJJ<JJJJJJJJFJAFFFJJFJ#<-F---F-<-<JJ<#-<#AAJJ#J#A#JFFA-7A-
ADD COMMENT
0
Entering edit mode

Many thanks! And this would also work with gzip compressed files just adding a gunzip/gzip command right?

ADD REPLY

Login before adding your answer.

Traffic: 3443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6