How To Pull Out Qual From Multi Line Fastq?
2
1
Entering edit mode
12.8 years ago
Zhshqzyc ▴ 520

From http://biostar.stackexchange.com/questions/1389/how-to-generate-a-consensus-fasta-sequence-from-sam-tools-pileup We use awk and perl to get fasta by

awk '/^@chr1$/,/^+$/' consensus.fastq | perl -pe "s/@/>/ ; s/\\+//" > chr1.fasta


The sample data

@chr1
nnnnnnnagatagaaataCACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTT
GGTCCGAAGCATGAGTGTTTACAATGTTTGAAtaCCTTATACAGTTCTTATACATACTTT
ATAAATTATTTCCCaagctgttttgatacactcactaacagaTATTCTATAGAAGGAAAA
+
!!!!!!!????????BBBEEEHKKKKKKKKKKKKNNNNNNQQNNNNNNQWWWW7WZWWWZ
ZZZ]]]>]]]]]]]ZTQQQTQNNKKKKKKHKKHHHEEEEEEEHHHHHHHHH
HKKHHHHHHEEEEEBBBBBBBBBBBB?=B@BBBBBB??BBBBEEEEEEEEEEHKNNNNNQ


Now I want to extract the qual for each chromosome. But the qual score for each chromosome all beginning with "+", how can I extract them?

Thanks.

fastq quality • 5.1k views
0
Entering edit mode

It quality doesn't wrap lines can't you just grep for "+"?

2
Entering edit mode
12.8 years ago

Here is quick and extremely dirty python script to pull out fastq quality not making assumptions about line wrapping:

import sys

inFile = open(sys.argv[1],'r')

qual = ''

seqs = False
quals = False
for line in inFile:
if line[0] == "@":
print qual
quals = False
qual = ''
elif line[0] == "+":
quals = True
else:
if quals:
qual += line.strip()

print qual


Save as yourName.py and run it by: python yourName.py yourFile.fastq > output.quals

edit * There was a mistake in the code where the qual variable wasn't being reset.

edit **

Here is the script to get just one entry:

import sys

inFile = open(sys.argv[1],'r')

qual = ''

seqs = False
quals = False
for line in inFile:
if line[0] == "@":
print qual
sys.exit()
quals = False
qual = ''
elif line[0] == "+":
quals = True
else:
if quals:
qual += line.strip()

print qual


Save it as yourName.py. Use it by: python yourName.py yourFile.fastq "entry name"

0
Entering edit mode

My fastq file is 6GB, I got the quals file is more than 12GB. Is it right?

0
Entering edit mode

0
Entering edit mode

Oops, there was a mistake. I've fixed it.

0
Entering edit mode

Now I got:There are many returns, is it correct?

chr2 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ]]]G]]cfTcffcfcOilifffEciiiififfcccccccfccffil Ullllliorrrrooiiooorueuuuuruxuuuxxrrrroorrrrrolllliilloooororoololoroollllllllllliilllrr QQQQQQQHQQQQQQQQQQQQQQQQTTTTQQQQQHQNNNNNNNNNNNNNKKKKKKKKHHK

NNKKKNTTTTWWWQWZZZZT]ccccifcllllilllooorrooooorlflllooooil

]]]WWWWWTTTTTTTQWZZ]Z]]]]Z]Z]]]]]]]]]`cccfcliioouuuuuuu

{xxuuuuuruxxxx{{xx{xuuurrorlllloooorrrrooooooooouorrloouuuu

0
Entering edit mode

updated: If I only want to get qual for chromosome 1, how to change code?

0
Entering edit mode

I've added the code to just get one entry.

0
Entering edit mode

I've added the scrip for getting just one entry. Make sure you typed the name of the entry correctly with quotes if there are spacing. If you didn't type it correctly, it will just give you the last entry.

0
Entering edit mode

Many many many thanks.

0
Entering edit mode

line[0] can be '+' for one of the qual lines if the phred33 score is 10. So maybe this code may break in some cases.

0 1 2 3 4 01234567890123456789012345678901234567890 !"#\$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI

0
Entering edit mode

line[0] can be '+' for one of the qual lines if the phred33 score is 10. So maybe this code may break in some cases.

0
Entering edit mode

Also @ is ASCII 64 which is 64 - 33 = PHRED Q 31. The quality line can contain and begin with both + and @.

1
Entering edit mode
12.8 years ago
Haluk ▴ 190

fq2fa tool extracts qual scores from fastq file.

Here is the link : http://www.phyloware.com/Phyloware/XSTK.html