Question: seqtk subseq command not pulling full fastq reads
0
gravatar for kratnasiri
23 months ago by
kratnasiri0
kratnasiri0 wrote:

When I run the seqtk subseq to extract sequences, it completely disregards the quality scores and the sequence.

>>>./seqtk subseq file.fastq IDs.txt
@NS500126:798:HWTLTAFXX:1:11101:10771:1083:1-1 1:N:0:AATGGTCG+NGACCATT
G
+
A
@NS500126:798:HWTLTAFXX:1:11101:2195:1084:1-1 1:N:0:AATGGTCG+NGACCATT
G
+
A
@NS500126:798:HWTLTAFXX:1:11101:2581:1084:1-1 1:N:0:AATGGTCG+NGACCATT
T
+
A

It's pulling the first character in the quality score and read; however, it's not pulling the full line.

Is there a way to fix that?

sequencing next-gen • 1.3k views
ADD COMMENTlink modified 21 months ago by Biostar ♦♦ 20 • written 23 months ago by kratnasiri0
1

As per seqtk results, remove spaces in sequence header in fastq file and ids.txt and try again.

try seqkit:

$ seqkit grep -nf id.txt test.fq
ADD REPLYlink modified 23 months ago • written 23 months ago by cpad011213k

The command you provided worked!! Though I'm not sure why you suggested I remove spaces.

ADD REPLYlink modified 23 months ago • written 23 months ago by kratnasiri0

can you show us the output of

grep -A3 NS500126:798:HWTLTAFXX:1:11101:10771:1083:1-1 file.fastq
ADD REPLYlink written 23 months ago by WouterDeCoster44k

It doesn't seem to output anything.

if I look at the first couple lines of file.fastq it looks like this:

@NS500126:798:HWTLTAFXX:1:11101:10771:1083 1:N:0:AATGGTCG+NGACCATT
GTTGAGGCAAAGCACCCGCGAAGGCACCAGAGCCCGAGTCATCCCAGCCTTCAGAGCCAATCCTTATCCCGCAGTTACGGATCTAATTTGCCGACTTCCCTTACCTACATTGATCTATCGCCTAGAGACTCTAAACCTTGGAGACCTGCT
+
AAAAAEEAAEEEEAEEEEEEEEEEEAEAEEAEEEEEEEAE/EE/EEEAE6EEAE/EA6EAEEAEEEAEE6</AAEAEEEE/</EEEEEE/EA<E</6E/EEE<AE/E/EAAAEA/EAEAE//EEEEEE/EEA/<E/<E<<E/AAEA6A/A
@NS500126:798:HWTLTAFXX:1:11101:2195:1084 1:N:0:AATGGTCG+NGACCATT
GGGAATTCCAGGTTCATATGGACCATTTCAATCCATAATCCCGACTAAATGAGCATTTCAGTGATTTCCCGCGCCTTTCGGCGTAGGGTACACGCTGCTGCTCACATTGTAGCGCGCGTGCAGACCAGAACATCTAAGGGCATCACGGAC
+
ADD REPLYlink modified 23 months ago • written 23 months ago by kratnasiri0
2

This header in original post

@NS500126:798:HWTLTAFXX:1:11101:10771:1083:1-1 1:N:0:AATGGTCG+NGACCATT

does not match the example posted above

@NS500126:798:HWTLTAFXX:1:11101:10771:1083 1:N:0:AATGGTCG+NGACCATT

What gives?

ADD REPLYlink modified 23 months ago • written 23 months ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1517 users visited in the last hour