Question

How to change fastq reads header for running Trinity on them?

0

Entering edit mode

10.2 years ago

seta ★ 1.9k

Dear all,

I have large fastq files (about 6-7 Gb each of them), which their headers are not compatible with trinity program as the following error appeared after running trinity:

Error, pairs.K25.stats is empty. Be sure to check your fastq reads and ensure that the read names are identical except for the /1 or /2 designation

My header left fastq file is:

@D69F08P1:337:C4GGBACXX:2:1101:1378:2161 1:N:0:ATGTCA
GGGGGCAGTGACCAGCTGGACTGAGCTCATAGTATGGAGCTTCAGGCAGCTGCCTTTTCACCTGAAGGCCCCAGATAGGCTTAACCCAGGGGTCTTCTTT
+
BBBFFFFFF<FFFIIIIIIIIIIIIIIIIIIIBFFIIIFIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFF77BBFFFF
@D69F08P1:337:C4GGBACXX:2:1101:1497:2202 1:N:0:ATGTCA
TATGGATGGTGATCACTCAGGCTGAAACCCCCAGCAAGGAATCTTTGGATGAGGGCCAGCTGAGATCTCTCTTGGTCGGAGTATGCATCCATGATCATGG

and the right fastq file header as

D69F08P1:337:C4GGBACXX:2:1101:1378:2161 2:N:0:ATGTCA
GTTCCAATCTGTCTCATGTATGGAAAAGAAGACCCCTGGGTTAAGCCTATCTGGGGCCTTCAGGTGAAAAGGCAGCTGCCTGAAGCTCCATACTATGAGC
+
BBBFFFFFFFFFFIIIIIFIIIIIIIIFFIIIIIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIFFIIFFFFFFFFFFFFFFFFBBFFFFFFFFFFFFFF
@D69F08P1:337:C4GGBACXX:2:1101:1497:2202 2:N:0:ATGTCA
CGAATATAGAGAAAGCCAGCAGACTTGCCAATGTGCCTTCTGGGATGAAGACAATTCCGACGACTTAGTCTCCCAATCTGCCCAGGAATCAGATCAATGG

I think the header (say left reads) should change to @D69F08P1:337:C4GGBACXX:2:1101:1378:2161:N:0:ATGTCA/1 that is acceptable for trinity, am I right?. Could you please share your commands to this end? Thanks so much in advance

trinity RNA-Seq ngs Assembly • 9.1k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by seta ★ 1.9k

Ram · Answer 1 · 2015-05-10

1

Entering edit mode

10.2 years ago

GouthamAtla 12k

To convert (though not parallelised):

awk '{ if (NR%4==1) { print $1"_"$2"/1" } else { print } }' Read1.fastq | less
awk '{ if (NR%4==1) { print $1"_"$2"/2" } else { print } }' Read2.fastq | less

But you can read this thread from trinity google group.

ADD COMMENT • link 10.2 years ago by GouthamAtla 12k

0

Entering edit mode

To save to file:

awk '{ if (NR%4==1) { print $1"_"$2"/1" } else { print } }' Read1.fastq > rename_Read1.fastq
awk '{ if (NR%4==1) { print $1"_"$2"/2" } else { print } }' Read2.fastq > rename_Read2.fastq

ADD REPLY • link 10.2 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks a lot Geek. It works fine with unzip files, but I didn't know why it doesn't work with gzip file, even I applied "gunzip filename | " before your command! I'm dealing with many large files, any suggested command to solve the problem would be really appreciated.

ADD REPLY • link 10.2 years ago by seta ★ 1.9k

0

Entering edit mode

Did you read the Google thread ? I have used trinity recently on raw fastq from illumina HiSeq1000 and it worked fine without any error regarding /1 or /2. Regarding gz files, use

zcat filename | command | gzip > out.gz

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by GouthamAtla 12k

Ram · Answer 2 · 2015-05-10

0

Entering edit mode

10.2 years ago

h.mon 35k

Are you sure the problem are fastq headers? I have used recent versions of Trinity (versions r20140717 and 2.0.6) with files with the same kind of headers you have, and did not have problems. Did you pre-process your files with some software ignoring read pairs? To quickly test if your files have uneven number of lines (hence reads are unpaired), do:

zcat Read1.fastq.gz | wc -l -
zcat Read2.fastq.gz | wc -l -

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by h.mon 35k

0

Entering edit mode

Yeah. If you have done any preprocessing, the reads should be reordered. Otherwise they lose pairing information.

ADD REPLY • link 10.2 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks for your comment. I'm not sure about it, just guess but I checked the output of your commands, they were the same. I'm also using the recent version on a raw fastq file, without any pre-processing.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by seta ★ 1.9k

0

Entering edit mode

First header from right fastq file does not start with @ on your post, is it a copy/paste error or your file is actually missing the @?

ADD REPLY • link 10.2 years ago by h.mon 35k

0

Entering edit mode

Sorry, it's missed during the copy, paste of error. It also started with @. Unfortunately, the new error was appeared that I asked in a new post

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 10.2 years ago by seta ★ 1.9k

score 0 · Answer 3 · 2018-03-21

I have single end reads, so just needed to add a '/1' to my IDs that were generated by the samtools bam2fq command.

sed gets confused adding the '/' (forward slash) character to a file.

I did it with:

awk '/^@HISEQ/ {$0=$0"/1"} 1' original.fq > newfile.fq

This command looks for likes beginning in '@HISEQ' and adds a '/1' to the end of that line.

For example:

original fastq header

@HISEQ-MFG:77:HFTM7BCXX:1:1101:1436:2041
CCGCATTCCAAACGGGGCCTA
+
DDDB@EHHHHHIIHIIIHHII

same header after awk '/^@HISEQ/ {$0=$0"/1"} 1' bam2fq.fq > newfile.fq

@HISEQ-MFG:77:HFTM7BCXX:1:1101:1436:2041/1
CCGCATTCCAAACGGGGCCTA
+
DDDB@EHHHHHIIHIIIHHII