Question: Phred/Phrap pipeline starting with FASTA file of paired-end reads and using a reference sequence
0
gravatar for DaniCee
3.6 years ago by
DaniCee10
Singapore
DaniCee10 wrote:

Hello everyone,

this is my first question here, and I am still quite new with this topic. I need to assemble short reads guided (or not) by a reference sequence using Phrap.

I have a FASTA file with 50bp paired-end reads (I also have it in SAM, BAM, and FASTQ formats) mapping to a full reference sequence I have in FASTA format as well. I obtained my read maps with Bowtie2 and SamTools.

I explicitly want to use Phrap to obtain a full-length assemble of the reads and compare it to the reference via a pairwise alignment with Needle. I want to do it using the reference as guide, and not using it as well.

I have been sent the Phred and Phrap programs, but I am quite lost. I have tried Phrap alone with no quality file, but I get many short contigs instead of one long one.

I understand I should follow the whole Phred -> Phd2fasta -> CrossMatch -> Phrap protocol, but I do not seem to find my way around it. It seems Phred uses a chromatogram file as input, but I do not know how to obtain it.

So my question is how should I follow the Phred/Phrap protocol starting with a FASTA file (SAM, BAM, or FASTQ) with 50bp reads mapping a reference FASTA file, as inputs? I want to obtain a contig that spans the full length of the reference (using the reference and not using it as input).

Thanks a lot!

ADD COMMENTlink modified 3.6 years ago by Brian Bushnell16k • written 3.6 years ago by DaniCee10
0
gravatar for Brian Bushnell
3.6 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

50bp reads will not give you a good assembly no matter what you do, unless you are trying to assemble a tiny virus.

You might, possibly, get a better assembly using Spades, which is very easy to use.  There's no point in using OLC/String Graph assemblers on such tiny reads.  But unless you are working on a virus (and often, even then, as viruses can be hard to assemble), you will not get a 1 contig assembly from 50bp reads.  Certainly, never for a bacteria.  You'd be lucky to get a 1000-contig assembly of a bacteria using 50bp reads.

What kind of organism are you trying to assemble?  And why are you using 50bp reads?

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Brian Bushnell16k

I am just trying to assemble VDJ combinations, not whole genome; I run bowtie2 with all the cell reads against a certain combination. I wasn't getting totally bad results with velvet, but I was getting a perfect assembly with codoncode, which uses phred and phrap, that's why I wanted to use phred and phrap to automate the process... how should I use phred and phrap? I will look into spades too.

ADD REPLYlink written 3.6 years ago by DaniCee10

How can I run Phred/Phrap with a FASTQ/FASTA/BAM/SAM file as input and with a reference FASTA sequence?

ADD REPLYlink written 3.6 years ago by DaniCee10

I am trying to convert my input FASTA/FASTQ file into a chromatogram SCF or ABI file using BioPerl as indicated in this other thread C: Converting A Dna Sequence To Abi Or Scf Format but this approach does not work... any clue?

ADD REPLYlink written 3.6 years ago by DaniCee10

Can anyone help with this? Thanks!

ADD REPLYlink written 3.6 years ago by DaniCee10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 943 users visited in the last hour