How to change fastq reads header for running Trinity on them?
3
0
Entering edit mode
9.0 years ago
seta ★ 1.9k

Dear all,

I have large fastq files (about 6-7 Gb each of them), which their headers are not compatible with trinity program as the following error appeared after running trinity:

Error, pairs.K25.stats is empty. Be sure to check your fastq reads and ensure that the read names are identical except for the /1 or /2 designation

My header left fastq file is:

@D69F08P1:337:C4GGBACXX:2:1101:1378:2161 1:N:0:ATGTCA
GGGGGCAGTGACCAGCTGGACTGAGCTCATAGTATGGAGCTTCAGGCAGCTGCCTTTTCACCTGAAGGCCCCAGATAGGCTTAACCCAGGGGTCTTCTTT
+
BBBFFFFFF<FFFIIIIIIIIIIIIIIIIIIIBFFIIIFIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFF77BBFFFF
@D69F08P1:337:C4GGBACXX:2:1101:1497:2202 1:N:0:ATGTCA
TATGGATGGTGATCACTCAGGCTGAAACCCCCAGCAAGGAATCTTTGGATGAGGGCCAGCTGAGATCTCTCTTGGTCGGAGTATGCATCCATGATCATGG

and the right fastq file header as

D69F08P1:337:C4GGBACXX:2:1101:1378:2161 2:N:0:ATGTCA
GTTCCAATCTGTCTCATGTATGGAAAAGAAGACCCCTGGGTTAAGCCTATCTGGGGCCTTCAGGTGAAAAGGCAGCTGCCTGAAGCTCCATACTATGAGC
+
BBBFFFFFFFFFFIIIIIFIIIIIIIIFFIIIIIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIFFIIFFFFFFFFFFFFFFFFBBFFFFFFFFFFFFFF
@D69F08P1:337:C4GGBACXX:2:1101:1497:2202 2:N:0:ATGTCA
CGAATATAGAGAAAGCCAGCAGACTTGCCAATGTGCCTTCTGGGATGAAGACAATTCCGACGACTTAGTCTCCCAATCTGCCCAGGAATCAGATCAATGG

I think the header (say left reads) should change to @D69F08P1:337:C4GGBACXX:2:1101:1378:2161:N:0:ATGTCA/1 that is acceptable for trinity, am I right?. Could you please share your commands to this end? Thanks so much in advance

trinity RNA-Seq ngs Assembly • 8.2k views
ADD COMMENT
1
Entering edit mode
9.0 years ago

To convert (though not parallelised):

awk '{ if (NR%4==1) { print $1"_"$2"/1" } else { print } }' Read1.fastq | less
awk '{ if (NR%4==1) { print $1"_"$2"/2" } else { print } }' Read2.fastq | less

But you can read this thread from trinity google group.

ADD COMMENT
0
Entering edit mode

To save to file:

awk '{ if (NR%4==1) { print $1"_"$2"/1" } else { print } }' Read1.fastq > rename_Read1.fastq
awk '{ if (NR%4==1) { print $1"_"$2"/2" } else { print } }' Read2.fastq > rename_Read2.fastq
ADD REPLY
0
Entering edit mode

Thanks a lot Geek. It works fine with unzip files, but I didn't know why it doesn't work with gzip file, even I applied "gunzip filename | " before your command! I'm dealing with many large files, any suggested command to solve the problem would be really appreciated.

ADD REPLY
0
Entering edit mode

Did you read the Google thread ? I have used trinity recently on raw fastq from illumina HiSeq1000 and it worked fine without any error regarding /1 or /2. Regarding gz files, use

zcat filename | command | gzip > out.gz
ADD REPLY
0
Entering edit mode
9.0 years ago
h.mon 35k

Are you sure the problem are fastq headers? I have used recent versions of Trinity (versions r20140717 and 2.0.6) with files with the same kind of headers you have, and did not have problems. Did you pre-process your files with some software ignoring read pairs? To quickly test if your files have uneven number of lines (hence reads are unpaired), do:

zcat Read1.fastq.gz | wc -l -
zcat Read2.fastq.gz | wc -l -
ADD COMMENT
0
Entering edit mode

Yeah. If you have done any preprocessing, the reads should be reordered. Otherwise they lose pairing information.

ADD REPLY
0
Entering edit mode

Thanks for your comment. I'm not sure about it, just guess but I checked the output of your commands, they were the same. I'm also using the recent version on a raw fastq file, without any pre-processing.

ADD REPLY
0
Entering edit mode

First header from right fastq file does not start with @ on your post, is it a copy/paste error or your file is actually missing the @?

ADD REPLY
0
Entering edit mode

Sorry, it's missed during the copy, paste of error. It also started with @. Unfortunately, the new error was appeared that I asked in a new post

ADD REPLY
0
Entering edit mode
6.1 years ago

I have single end reads, so just needed to add a '/1' to my IDs that were generated by the samtools bam2fq command.

sed gets confused adding the '/' (forward slash) character to a file.

I did it with:

awk '/^@HISEQ/ {$0=$0"/1"} 1' original.fq > newfile.fq

This command looks for likes beginning in '@HISEQ' and adds a '/1' to the end of that line.

For example:

original fastq header

@HISEQ-MFG:77:HFTM7BCXX:1:1101:1436:2041
CCGCATTCCAAACGGGGCCTA
+
DDDB@EHHHHHIIHIIIHHII

same header after awk '/^@HISEQ/ {$0=$0"/1"} 1' bam2fq.fq > newfile.fq

@HISEQ-MFG:77:HFTM7BCXX:1:1101:1436:2041/1
CCGCATTCCAAACGGGGCCTA
+
DDDB@EHHHHHIIHIIIHHII
ADD COMMENT

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6