Question: ways to concatenate individual DNA sequences together to form complete sequence
0
gravatar for abdul.karim
17 days ago by
abdul.karim0 wrote:

Hi,

I am very new in the field of Genomics. So I apologize for a very basic question I am about to ask.

I have raw DNA sequences for many samples. For a single sample, the DNA seq is chopped into fixed sized fragments and stored in FASTQ format.

For instance, sample A DNA sequence is chopped into 101562193 fragments each with a length of 151.

Is there any way I can concatenate the fragments in right order to reconstruct the whole DNA string?

Or that is not possible?

ADD COMMENTlink modified 17 days ago by swbarnes26.5k • written 17 days ago by abdul.karim0
2
gravatar for bernatgel
17 days ago by
bernatgel2.0k
Barcelona, Spain
bernatgel2.0k wrote:

Hi @abdul.karim

It's not that easy as concatenating them to reconstruct the original DNA sequence. What you have is the result of sequencing a sample with an NGS sequencer (most probably an Illumina one) and each of your fragments is called a read. You should start by mapping them into the genome, that is, finding the most probable part of the genome where the original molecule that was sequenced came from. To do that you need a read mapper. Take a look at BWA as a widely used one.

However, I would recommend you to read a few tutorials and to seek help from colleagues before starting with that. This would help you get up to speed much faster and avoid the many common errors we all did at the beginning.

ADD COMMENTlink written 17 days ago by bernatgel2.0k
2
gravatar for swbarnes2
17 days ago by
swbarnes26.5k
United States
swbarnes26.5k wrote:

Is there any way I can concatenate the fragments in right order to reconstruct the whole DNA string?

Sure you could concatenate them, but as bernatgel said, this is almost certainly output from an Illumina sequencer, and the reads are unplaced position-wise. Simply concatenating them would be nonsense.

If you have a reference genome that is a close match, you could align the reads to it, and make a consensus sequence.

If you have no reference at all, you can try to assemble the reads, which will almost certainly not give you a single resulting sequence, but many many contigs.

ADD COMMENTlink written 17 days ago by swbarnes26.5k

Thank you for the help. Would you please explain what is reference genome? by aligning, do you mean that I compare each of the read with the reference genome?

ADD REPLYlink written 16 days ago by abdul.karim0
2

What tutorials have you looked at? I bolded key words so you would know what to Google.

ADD REPLYlink written 16 days ago by swbarnes26.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1877 users visited in the last hour