Question: I loaded a BAM (RNA-seq) file into IGV but cant see anything!
1
gravatar for jyalda2
3.0 years ago by
jyalda210
Belgium
jyalda210 wrote:

Hello everyone,

I have RNA-seq data set that I downloaded the BAM file from GEO. I loaded BAM file into IGV but could not see any thing in IGV! I don't know whats wrong!  Is my BAM file sorted? or is the created .bai file the correct one and indexed fully?! could some one help me, please?

 BAM file that I downloaded from GEO is about 5GB, after loading from my local file to remote file in ssh the size is about4.5GB

Steps that I followed:

1. I sorted BAM file using samtools sort file name.bam file name.sorted in Bitvise SSH

sorted file size: about4.5GB

2. made index using samtools index file name.sorted.bam

I got the .bai file quickly with size about 13.5KB!

then transferred  all sorted,bai file into the same place that I saved BAM file on my PC and tried to visualize with IGV (Hg19)

Thank you very much for your help in advance.

Best regards

 

 

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by jyalda210
1

Are you sure the BAM contains hg19 alignments? That index seems way too small.

ADD REPLYlink written 3.0 years ago by Dan D6.2k

It is mentioned in the method part of their paper that they mapped using hg19 build.

ADD REPLYlink written 3.0 years ago by jyalda210
1

make sure you are zoomed in enough  (do you see text that says: zoom in to see alignments?)

ADD REPLYlink written 3.0 years ago by Istvan Albert ♦♦ 74k

Yes I'm zoomed in. I can see reference genes but nothing for my files! I wanted to upload a screenshot of my work on IGV here but dont know how?!

I wanted to use UCSC but that one also needs BAM + .bai file. In galaxy could not upload it noted that its >2G so cant be uploaded!

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by jyalda210

To post an image, upload it to imgur and then use that link here.

Can you please post the output of samtools view -h [bamfile] | head -30 ?

ADD REPLYlink written 3.0 years ago by Dan D6.2k

Thanks.

Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can i post the screen shot?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by jyalda210

IGV IMAGE

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by jyalda210

That one says "zoom in to see alignments". When in doubt, have a look at the header for the BAM file with samtools view and reindex.

ADD REPLYlink written 3.0 years ago by Devon Ryan73k

Image of IGV : http://i.imgur.com/rCgnE7I.jpg

Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can i post the screen shot?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by jyalda210

Right, so zoom in.

ADD REPLYlink written 3.0 years ago by Devon Ryan73k

Zoomed in but still nothing!  http://i.imgur.com/mRPfEgy.jpg

ADD REPLYlink written 3.0 years ago by jyalda210

Keep zooming, you're not going to see alignments at that scale. Try 1kb or so.

ADD REPLYlink written 3.0 years ago by Devon Ryan73k

I did but no difference!

ADD REPLYlink written 3.0 years ago by jyalda210

Please post the output of samtools view -h [bamfile] | head -30 

That will show you the precise coordinates of the first few alignments contained in the BAM. It will also give us a look at the header of the BAM.

Once we have some example alignment coordinates from the BAM, we can then jump to that exact location in IGV and go from there. Posting that information will be a huge help for us.

ADD REPLYlink written 3.0 years ago by Dan D6.2k

screen shot after running samtools view -h [bamfile] | head -30 : http://i.imgur.com/npAtMmw.jpg

ADD REPLYlink written 3.0 years ago by jyalda210

Great, thank you.

Now, in IGV, in that text field at the top just to the left of the "home" button, type the following:

chr1:14410

and click "Go"

What do you see?

ADD REPLYlink written 3.0 years ago by Dan D6.2k

Result: http://i.imgur.com/L94II4r.jpg

different zoom in:

http://i.imgur.com/r0qFCVx.jpg

http://i.imgur.com/YrjH8hH.jpg

 

ADD REPLYlink written 3.0 years ago by jyalda210

Cool! So problem resolved then?

ADD REPLYlink written 3.0 years ago by Dan D6.2k

Thank you for your help :)

what chr1:14410 stand for? I searched but got confused a bit! :P

I want to see the positive and negative control to check whether the data set is good . when I type and search for housekeeping genes or known dis-regulated genes or genes that is mentioned in the related paper (such as SPP1 or FN1 genes),again (the same problem) there is nothing for my BAM file but it can be find on reference gene. How I can search for different genes?

I only can see the reads when I search for chr1:14410, chr1:14401 or chr1:13507 that were in this output http://i.imgur.com/npAtMmw.jpg. So according this visualization sorting of my BAM file and generating of index was correct?

Is it possible to save this output as .txt file for other chr?

sorry for asking so many questions :)

 

ADD REPLYlink written 3.0 years ago by jyalda210
11
gravatar for Dan D
3.0 years ago by
Dan D6.2k
Tennessee
Dan D6.2k wrote:

I'm going to summarize how we got to the resolution, and hopefully provide some context to help you dig deeper into your follow-up questions.

For the purposes of quick understanding, it's reasonable to consider a BAM file as a list of sequence alignments, where each alignment is composed of a single line which has exhaustive detail about the nature of the alignment to the reference. For more information, read the SAM specification. It's not long and it will provide a lot of information as to what's going on.

A BAM file is simply the compressed, non-human-readable (binary) form of a SAM file. One of the most common ways to interrogate BAM files is a very powerful command-line program called samtools.

You weren't seeing any alignments at first because you weren't zoomed in far enough. The genome is HUGE compared to the length of a typical Illumina read (seven orders of magnitude larger), so unless you zoom in really far the relative length of an alignment is going to be smaller than a single pixel.

BUT! It's definitely possible that there are large regions of the genome which have poor coverage, especially if the dataset is from exome or RNA-Seq, or is WGS with lousy coverage. So just zooming in randomly, you wouldn't necessarily expect to see alignments appear.

What I asked you to do was use samtools to pull out the first few reads from your sorted BAM file. Since the BAM file was sorted, the alignments start at the beginning of chromosome 1. In the screenshot, I saw that you had some reads around base 14,000 of chromosome 1, so I asked you to jump to that area in IGV.

IGV can also search based on gene names, but remember that gene names differ based on the reference used and the entity which is cataloguing the genes. Another approach would be to use a resource like gene metabase like GeneCards to search for your gene, find a consensus start site, and then search based on that coordinate in IGV. For your SPP1 gene, it looks like the start site is at chromosome 4, base 88,896,802, so if you type chr4:88896802 in IGV it will take you to that region of the genome.

Similarly, you can use samtools to query for alignments in a BAM file. If you wanted to find all of the alignments within the consensus start/end region of SPP1, you could type something like this:

samtools view [bamfile] chr4:88896802-88904563

That will give you all of the alignments in human-readable form. You can pipe these to a text file if you want, or use perl/python/sed/awk to pull out specific details.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Dan D6.2k

Your explanation was really helpful :) Actually I did this, I tried to find the location of genes and search by location on chr. but non of them worked. I'll try again. Thanks

ADD REPLYlink written 3.0 years ago by jyalda210
0
gravatar for jyalda2
3.0 years ago by
jyalda210
Belgium
jyalda210 wrote:

Your explanation was really helpful :) Actually I did this, I tried to find the location of genes and search by location on chr. but non of them worked. I'll try again. Thanks

ADD COMMENTlink written 3.0 years ago by jyalda210
1

You can direct type the name of the gene in IGV. 

ADD REPLYlink written 3.0 years ago by Nicolas Rosewick5.7k

And if you want to have an idea of the expression of different genes across several samples, you should use htseq-count to count the number of reads per gene, and then use a tool to assess differential expression like DESeq2 or edgeR. Don't forget that each sample is sequence at different depth so it's dangerous to compare raw coverage without normalization.

ADD REPLYlink written 3.0 years ago by Nicolas Rosewick5.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 857 users visited in the last hour