Question: Reads and Read Segments in Alignment
1
gravatar for jrowan
4.4 years ago by
jrowan10
United States
jrowan10 wrote:

My understanding of SAM files and the format is fairly good, but there are some things I haven't quite grasped.  I'm not sure how obvious you all may find these questions, but they're what have come to mind. 

I'm interested in recovering the original sequenced read after some alignment has been done.  I'd like to know which pieces of the read/read segments I need.  How do I know what the full read sequence was that came off the sequencer?  Can I reconstruct it by connecting the read segments in the same template together?  If so, what is the template?  It's not the read is it?  The SAM format doesn't suggest that it is; it says the template is some DNA/RNA fragment.

Here are some questions:

  • "What is the difference between the read I get from sequencing and the read segments I see in a SAM file?"
  • Intuition tells me that read segments are mapped portions of the larger read, but are they arbitrarily segmented in the SAM presentation?
  • Are segments contiguous?  Can they also be non-contiguous?
  • Can I reconstruct the full read from the multiple read segments?
  • How does template correspond to a sequence read?

I'm very grateful for any clarification I can get on these questions.

sequencing sam alignment • 1.8k views
ADD COMMENTlink modified 4.4 years ago by Renesh1.6k • written 4.4 years ago by jrowan10

You can look into picard SamToFastq 

ADD REPLYlink written 4.4 years ago by geek_y9.8k

That worked for me. Thanks!

ADD REPLYlink written 10 days ago by Rashedul Islam310
2
gravatar for Renesh
4.4 years ago by
Renesh1.6k
United States
Renesh1.6k wrote:

The given read from your query file (fastq file) can match to multiple locations in genome. You can check this with NH flag in sam file. The reads can also overlap each other as they are sequenced from DNA fragments and you can find this by comparing the mapping co-ordinates in sam file.

The aligner take only read sequence from fastq file for mapping to reference sequence. If you want use contigous sequence (contig), you need to assemble it first and then map with reference sequence. As the contig will be longer in length, you need to be cautious while using aligner.

 

ADD COMMENTlink written 4.4 years ago by Renesh1.6k

I think this answers the question if I change your sentence to be: "The aligner takes - one at a time - a single read sequence from the fastq file for mapping to the reference."  I assume that's what you meant.

I'm now curious as to how alignment is presented in the SAM file if I used a contig as opposed to mapping each read.  Does the SAM output change?  I can't see anything in the format specification that says it would.  No flag or anything.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by jrowan10
1
gravatar for Renesh
4.4 years ago by
Renesh1.6k
United States
Renesh1.6k wrote:

The read segment in sam file (column 10) is same as the read sequence in your query file (As per output from Bowtie2 and bwa). 

You can construct the full read from multiple segment using any transcriptome/genome assembly tool

 

ADD COMMENTlink written 4.4 years ago by Renesh1.6k

Thank you for the reply.  Again, please forgive me if these questions are obvious to others.

I can see that these are the read sequences as they appear in a FASTA (or similar) file.  It is possible that a given segment mapped to multiple locations, right?  Isn't it also possible to have several overlapping (to various degree) segments?

With this in mind, how was alignment done?  Does alignment take the full read or sections of the read in aligning?  That is, is alignment performed with each read segment or the full read sequence against the reference?  (Is this something I'll need to crawl through code for?  Bowtie2's source, for instance?)

Edit: Ah, I think you've already given the answer in a somewhat roundabout fashion.  You used the phrase query file, which makes me believe that each sequence therein is a query sequence for alignment.  So, I take it that the read segments presented on column 10 of SAM files were used in alignment (and that alignment did not use the full read sequence as one long, contiguous string).  Is this correct?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by jrowan10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 923 users visited in the last hour