Question: looking for alignment in IGV
0
gravatar for KVC_bioinfo
12 months ago by
KVC_bioinfo350
Boston
KVC_bioinfo350 wrote:

Hello all,

I am visualizing the alignment (bam) file in IGV. I can see the distribution of reads across all the exons in the area of interest. I want to see if particular reads aligned across all exons or just a few of them. Looking at a huge number of reads it looks difficult to manually look for it. I am not sure how the strategy should be to proceed with this.

Could someone help me here?

Thank you in advance.

igv alignment • 540 views
ADD COMMENTlink modified 12 months ago by Malcolm.Cook900 • written 12 months ago by KVC_bioinfo350

if particular reads aligned across all exons or just a few of them

Not clear why you are interested in particular reads. Can you clarify?

ADD REPLYlink written 12 months ago by genomax59k

For example:

The gene I am interested in looking at has 6 exons. I want to see if read A aligned to exon 1,2,3,4,5,6 or just exon 1,2?

ADD REPLYlink written 12 months ago by KVC_bioinfo350

Are they tiny exons or are you looking for multi-mapping to more than one exon?

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax59k

The exons are around 200,000.

I am interested in look at for one read if part of read mapped to exon 1 and part of read mapped to exon 2 and so on till exon 6.

Basically, I am looking at pattern of alignment across all the exons.

ADD REPLYlink written 12 months ago by KVC_bioinfo350
1

Right click on the alignments and look in the Sort and Group alignments by options to see if one of those help highlight these kind of reads.

ADD REPLYlink written 12 months ago by genomax59k

Thanks! But every exon has almost more than 50 reads matching. It looks every tedious to manually check for 6 exons. Is there any other way I can do it?

ADD REPLYlink written 12 months ago by KVC_bioinfo350
1

You can filter out spliced reads using answers here: Filtering out the spliced reads from bam file.

ADD REPLYlink written 12 months ago by genomax59k

Thanks! I tried to filter out the reads that mapped to the area of interest using samtools and the coordinates of the region. However, in IGV i could see 19 reads aligned in that region but when I extracted those using samtools only 10 wherein the output.

Why would that happen?

ADD REPLYlink written 12 months ago by KVC_bioinfo350

Only 10 are spliced?

ADD REPLYlink written 12 months ago by genomax59k

For region of interest say, 57,000,000-57,500,000

In IGV I can see 19 reads are aligned. However, when I try to extract those reads using

samtools view -h -q 10 out.bam "chrX:57,000,000-57,500,00" > gene.bam

I get only 10 reads.

ADD REPLYlink written 12 months ago by KVC_bioinfo350

I am little confused over the question but would like to help in someway.

  • You need to specify if the reads that were aligned were single end vs. paired end and the size.

    • If you are using illumina paired ends (which are the longest possible in illumina) how come even a paired end read cover 200,000 bp - you mentioned earlier comment tath exons are 200,000 BP. It is possible for paired end reads (max of 300 bp for 2 paired end x 150 bp read) to cover two distant exons but not all the part of exons. In you case its like 6 exons. How is it possible for one read pair to cover such a large number of exons?

    • You should have understood one thing when you pulled only 10 reads overlapping in chrX:57,000,000-57,500,00 ; that means the alignment in this region in very sparse.

If you are using nanopore or pacbio reads than it is a different story. So, please add more detail to your question. Also add some plots from IGV to clarify the issue.

ADD REPLYlink modified 12 months ago • written 12 months ago by kirannbishwa01890

How is it possible for one read pair to cover such a large number of exons?

Reads are supposed to contain spliced segments. Exactly how, only @KNC_bioinfo knows, since we can't see the data.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax59k

I am sorry about that. I am using nanopore reads.

ADD REPLYlink written 12 months ago by KVC_bioinfo350

Important information. Should have been in the original post. What aligner did you use? Perhaps it does not use right tags for you to be able to pull out spliced reads.

ADD REPLYlink written 12 months ago by genomax59k
0
gravatar for Malcolm.Cook
12 months ago by
Malcolm.Cook900
kansas, usa
Malcolm.Cook900 wrote:

Is sounds like you require IGV to visually indicate which RNASeq reads in a bam file are compatible with which transcript models that they overlap, and to what extent.

This is not a fully specified requirement, as it does not define "compatible" or "extent", and does not address issues such as:

  • alternate splicing yielding overlapping transcript models
  • over-lapping genes

Regardless, I believe IGV can not do this for you.

If you fully specify the requirement, you might be able to pre-compute "compatibility" for each read and add it as a user-defined custom FLAG in the sam/bam file, and then you can use IGV's popup-menu capability to color by the value of this attribute.

If you code in R, you may find the following abstraction and function useful: findCompatibleOverlaps

Depending on your application you will want to read:

ADD COMMENTlink modified 12 months ago • written 12 months ago by Malcolm.Cook900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour