I loaded a BAM (RNA-seq) file into IGV but cant see anything!
2
3
Entering edit mode
7.5 years ago
jyalda2 ▴ 30

Hello everyone,

I have RNA-seq data set that I downloaded the BAM file from GEO. I loaded BAM file into IGV but could not see any thing in IGV! I don't know whats wrong! Is my BAM file sorted? or is the created .bai file the correct one and indexed fully?! could some one help me, please?

Steps that I followed:

1. I sorted BAM file using samtools sort file_name.bam file_name.sorted in Bitvise SSH

2. made index using samtools index file name.sorted.bam

I got the .bai file quickly with size about 13.5KB!

Then transferred all sorted,bai file into the same place that I saved BAM file on my PC and tried to visualize with IGV (Hg19)

Best regards

rna-seq IGV RNA-seq BAM Visualization BAM index • 26k views
1
Entering edit mode

Are you sure the BAM contains hg19 alignments? That index seems way too small.

0
Entering edit mode

It is mentioned in the method part of their paper that they mapped using hg19 build.

1
Entering edit mode

Make sure you are zoomed in enough (do you see text that says: zoom in to see alignments?)

0
Entering edit mode

Yes I'm zoomed in. I can see reference genes but nothing for my files! I wanted to upload a screenshot of my work on IGV here but dont know how?!

I wanted to use UCSC but that one also needs BAM + .bai file. In galaxy could not upload it noted that its >2G so cant be uploaded!

0
Entering edit mode

To post an image, upload it to imgur and then use that link here.

Can you please post the output of samtools view -h [bamfile] | head -30?

0
Entering edit mode

Thanks.

Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can I post the screen shot?

0
Entering edit mode

IGV IMAGE

0
Entering edit mode

That one says "zoom in to see alignments". When in doubt, have a look at the header for the BAM file with samtools view and reindex.

0
Entering edit mode

Image of IGV :

Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can i post the screen shot?

0
Entering edit mode

Right, so zoom in.

0
Entering edit mode

Zoomed in but still nothing!

0
Entering edit mode

Keep zooming, you're not going to see alignments at that scale. Try 1kb or so.

0
Entering edit mode

I did but no difference!

0
Entering edit mode

Please post the output of samtools view -h [bamfile] | head -30

That will show you the precise coordinates of the first few alignments contained in the BAM. It will also give us a look at the header of the BAM.

Once we have some example alignment coordinates from the BAM, we can then jump to that exact location in IGV and go from there. Posting that information will be a huge help for us.

0
Entering edit mode

Screen shot after running samtools view -h [bamfile] | head -30 :

0
Entering edit mode

Great, thank you.

Now, in IGV, in that text field at the top just to the left of the "home" button, type the following:

chr1:14410


and click "Go"

What do you see?

0
Entering edit mode

Result:

different zoom in:

0
Entering edit mode

Cool! So problem resolved then?

0
Entering edit mode

Thank you for your help :)

what chr1:14410 stand for? I searched but got confused a bit! :P

I want to see the positive and negative control to check whether the data set is good . when I type and search for housekeeping genes or known dis-regulated genes or genes that is mentioned in the related paper (such as SPP1 or FN1 genes),again (the same problem) there is nothing for my BAM file but it can be find on reference gene. How I can search for different genes?

I only can see the reads when I search for chr1:14410, chr1:14401 or chr1:13507 that were in this output

So according this visualization sorting of my BAM file and generating of index was correct?

Is it possible to save this output as .txt file for other chr?

Sorry for asking so many questions :)

0
Entering edit mode

Your explanation was really helpful :) Actually I did this, I tried to find the location of genes and search by location on chr. but non of them worked. I'll try again. Thanks

2
Entering edit mode

You can direct type the name of the gene in IGV.

0
Entering edit mode

And if you want to have an idea of the expression of different genes across several samples, you should use htseq-count to count the number of reads per gene, and then use a tool to assess differential expression like DESeq2 or edgeR. Don't forget that each sample is sequence at different depth so it's dangerous to compare raw coverage without normalization.

19
Entering edit mode
7.5 years ago
Dan D 7.3k

I'm going to summarize how we got to the resolution, and hopefully provide some context to help you dig deeper into your follow-up questions.

For the purposes of quick understanding, it's reasonable to consider a BAM file as a list of sequence alignments, where each alignment is composed of a single line which has exhaustive detail about the nature of the alignment to the reference. For more information, read the SAM specification. It's not long and it will provide a lot of information as to what's going on.

A BAM file is simply the compressed, non-human-readable (binary) form of a SAM file. One of the most common ways to interrogate BAM files is a very powerful command-line program called samtools.

You weren't seeing any alignments at first because you weren't zoomed in far enough. The genome is HUGE compared to the length of a typical Illumina read (seven orders of magnitude larger), so unless you zoom in really far the relative length of an alignment is going to be smaller than a single pixel.

BUT! It's definitely possible that there are large regions of the genome which have poor coverage, especially if the dataset is from exome or RNA-Seq, or is WGS with lousy coverage. So just zooming in randomly, you wouldn't necessarily expect to see alignments appear.

What I asked you to do was use samtools to pull out the first few reads from your sorted BAM file. Since the BAM file was sorted, the alignments start at the beginning of chromosome 1. In the screenshot, I saw that you had some reads around base 14,000 of chromosome 1, so I asked you to jump to that area in IGV.

IGV can also search based on gene names, but remember that gene names differ based on the reference used and the entity which is cataloguing the genes. Another approach would be to use a resource like gene metabase like GeneCards to search for your gene, find a consensus start site, and then search based on that coordinate in IGV. For your SPP1 gene, it looks like the start site is at chromosome 4, base 88,896,802, so if you type chr4:88896802 in IGV it will take you to that region of the genome.

Similarly, you can use samtools to query for alignments in a BAM file. If you wanted to find all of the alignments within the consensus start/end region of SPP1, you could type something like this:

samtools view [bamfile] chr4:88896802-88904563


That will give you all of the alignments in human-readable form. You can pipe these to a text file if you want, or use perl/python/sed/awk to pull out specific details.

0
Entering edit mode

Your explanation was really helpful :) Actually I did this, I tried to find the location of genes and search by location on chr. but non of them worked. I'll try again. Thanks

Traffic: 2180 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.