Question: Applications/Methods For Whole Genome Shotgun Assembly?
gravatar for isa29
7.0 years ago by
isa2910 wrote:

(I have almost no experience with bioinformatics or biology in general prior to this summer; please excuse any gross abuses of terminology or general misunderstandings regarding the field)

I'm working with some FASTQ files for a project, about 40 gigabytes of them (17-18 Gb, 70-150 bp per sequence), and I suspect they're the result of shotgun sequencing, because there's no way the genome the files are supposed to represent is that large. If my understanding of shotgun sequencing is correct, this means that there's significant overlap between individual sequences which would allow for the sequences to be reconstructed into larger, contiguous sequences, dramatically reducing the size of the data and making it far easier to work with.

So far, the only promising lead I've found is an application by the name of ARACHNE, which appears to be exactly what I'm looking for, except that I don't have a sufficiently powerful Linux machine at hand with the correct software installed (although it might be possible to rectify this if no other options present themselves).

Short version: How can I go about turning this giant pile of tiny sequences into a smaller pile of larger sequences?

fasta assembly fastq • 1.5k views
ADD COMMENTlink modified 4 months ago by Biostar ♦♦ 20 • written 7.0 years ago by isa2910
gravatar for
7.0 years ago by
stolarek.ir650 wrote:

what technology was used for the sequencing? Are these single end, pair end reads Does the reference genome exist for the alignment and with-reference assembly purpose?

These are some questions that you need to go and find an answer on your own. Without some understanding what you have it's pretty much pointless to try and do anything.

Read about sequence assembly (it's not that easy that when you have overlap it goes great). And yes, you need some computational power to do the job

ADD COMMENTlink written 7.0 years ago by stolarek.ir650

Thanks for the reply. I believe they were sequenced with the Illumina HiSeq platform. I'm not sure what single end and pair end reads are, but I'll look into that. Same for the reference genome (I suspect not, though).

I'll continue reading up on sequence assembly, and see if I can convince IT to install the necessary software on one of the more powerful computers we've got.


ADD REPLYlink written 7.0 years ago by isa2910

here you have some tools used in bioinfromatics. It's presented in easy way. Go for assemblers (lon or short, you will know after some reading what is best suited for your type of reads)

ADD REPLYlink written 7.0 years ago by stolarek.ir650
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1710 users visited in the last hour