Question: AB1 to FASTQ
0
gravatar for DanielC
2.6 years ago by
DanielC130
Canada
DanielC130 wrote:

Dear All,

Could you please share on how to convert AB1 files to FASTQ file format standalone? Thanks much.

conversion ab1 fastq • 4.1k views
ADD COMMENTlink modified 10 months ago by renaegeier10 • written 2.6 years ago by DanielC130
1

While it would be possible to do that by generating fasta from AB1 and then adding fake Q scores to make fake fastq, the question is why you would want to do this.

ADD REPLYlink modified 2.6 years ago by WouterDeCoster44k • written 2.6 years ago by genomax92k
1
gravatar for Chronos
2.3 years ago by
Chronos600
Germany
Chronos600 wrote:

If you wish to batch-convert hundreds of AB1 files, and you are comfortable with command line and compiling tools, then you can use TraceTuner, see this thread on seqanswers on how to add .fastq support to it.

Another command-line solution would be to use seqret from the EMBOSS suite: seqret -sformat abi -osformat fastq -auto -stdout -sequence input.ab1 > output.fastq"

Otherwise, graphical molecular biology programs usually handle AB1 with ease, and will allow conversion.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Chronos600
1
gravatar for trausch
2.3 years ago by
trausch1.5k
Germany
trausch1.5k wrote:

Tracy can also convert to Fasta or Fastq:

tracy basecall -o out.fastq -f fastq input.ab1
ADD COMMENTlink written 2.3 years ago by trausch1.5k
1
gravatar for renaegeier
10 months ago by
renaegeier10
renaegeier10 wrote:

I realize this is late, but I found an additional way to do this using Biopython, which you have to install prior to the following:

$python3
$from Bio import SeqIO
$record = SeqIO.parse("file.ab1", "abi")
$count = SeqIO.write(record, "file.fastq", "fastq")
ADD COMMENTlink modified 10 months ago by finswimmer14k • written 10 months ago by renaegeier10

Out of curiosity what Q score (fake) does biopython assign to these reads?

ADD REPLYlink written 10 months ago by genomax92k

You can extract the quality scores in BioPython:

seqHandle = SeqIO.parse("file.ab1", "abi")

for seq in seqHandle:
    seqName = seq.id
    seqStr = str(seq.seq)
    seqQual = seq.letter_annotations["phred_quality"]

You probably also want to do some additional trimming, and I would definitely encourage you to go back and look at the original trace (if you think there might be a mutation).

However, this means that you would write each desired sequence manually. For simplicity, you could output a trimmed FASTA file. However, with the information above, you could still create a FASTQ file:

https://en.wikipedia.org/wiki/FASTQ_format#Format

ADD REPLYlink modified 10 months ago • written 10 months ago by Charles Warden7.9k

The quality score was actually defined for Sanger reads before Illumina reads. However, Illumina has also changed the quality offset over time.

This may also be a bit like the quality score distribution looking noticeably different for Illumina versus PacBio CCS reads. So, you may need to use your own judgement about what quality scores correspond to nucleotides that should be trimmed.

ADD REPLYlink modified 10 months ago • written 10 months ago by Charles Warden7.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1074 users visited in the last hour