Question: seqtk subseq command not pulling full fastq reads
0
gravatar for kratnasiri
13 months ago by
kratnasiri0
kratnasiri0 wrote:

When I run the seqtk subseq to extract sequences, it completely disregards the quality scores and the sequence.

>>>./seqtk subseq file.fastq IDs.txt
@NS500126:798:HWTLTAFXX:1:11101:10771:1083:1-1 1:N:0:AATGGTCG+NGACCATT
G
+
A
@NS500126:798:HWTLTAFXX:1:11101:2195:1084:1-1 1:N:0:AATGGTCG+NGACCATT
G
+
A
@NS500126:798:HWTLTAFXX:1:11101:2581:1084:1-1 1:N:0:AATGGTCG+NGACCATT
T
+
A

It's pulling the first character in the quality score and read; however, it's not pulling the full line.

Is there a way to fix that?

sequencing next-gen • 691 views
ADD COMMENTlink modified 11 months ago by Biostar ♦♦ 20 • written 13 months ago by kratnasiri0
1

As per seqtk results, remove spaces in sequence header in fastq file and ids.txt and try again.

try seqkit:

$ seqkit grep -nf id.txt test.fq
ADD REPLYlink modified 13 months ago • written 13 months ago by cpad011212k

The command you provided worked!! Though I'm not sure why you suggested I remove spaces.

ADD REPLYlink modified 13 months ago • written 13 months ago by kratnasiri0

can you show us the output of

grep -A3 NS500126:798:HWTLTAFXX:1:11101:10771:1083:1-1 file.fastq
ADD REPLYlink written 13 months ago by WouterDeCoster41k

It doesn't seem to output anything.

if I look at the first couple lines of file.fastq it looks like this:

@NS500126:798:HWTLTAFXX:1:11101:10771:1083 1:N:0:AATGGTCG+NGACCATT
GTTGAGGCAAAGCACCCGCGAAGGCACCAGAGCCCGAGTCATCCCAGCCTTCAGAGCCAATCCTTATCCCGCAGTTACGGATCTAATTTGCCGACTTCCCTTACCTACATTGATCTATCGCCTAGAGACTCTAAACCTTGGAGACCTGCT
+
AAAAAEEAAEEEEAEEEEEEEEEEEAEAEEAEEEEEEEAE/EE/EEEAE6EEAE/EA6EAEEAEEEAEE6</AAEAEEEE/</EEEEEE/EA<E</6E/EEE<AE/E/EAAAEA/EAEAE//EEEEEE/EEA/<E/<E<<E/AAEA6A/A
@NS500126:798:HWTLTAFXX:1:11101:2195:1084 1:N:0:AATGGTCG+NGACCATT
GGGAATTCCAGGTTCATATGGACCATTTCAATCCATAATCCCGACTAAATGAGCATTTCAGTGATTTCCCGCGCCTTTCGGCGTAGGGTACACGCTGCTGCTCACATTGTAGCGCGCGTGCAGACCAGAACATCTAAGGGCATCACGGAC
+
ADD REPLYlink modified 13 months ago • written 13 months ago by kratnasiri0
2

This header in original post

@NS500126:798:HWTLTAFXX:1:11101:10771:1083:1-1 1:N:0:AATGGTCG+NGACCATT

does not match the example posted above

@NS500126:798:HWTLTAFXX:1:11101:10771:1083 1:N:0:AATGGTCG+NGACCATT

What gives?

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax71k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1906 users visited in the last hour