Question

Best practice for RNA-Seq aligned sequences data for visualization via IGV

0

Entering edit mode

2.3 years ago

TS ▴ 30

Hello,

I have a question regarding the best practice for data format for visualization via IGV:

In my RNA-Seq pipeline I have created .bam and .bai files from .fastq reads via PICARD.

Now I'm unsure if I have to convert the .bam and .bai files further for visualization via IGV because I have seen other pipelines converting the .bam and .bai files to .bedgraph and then further to .bigwig.

However, I didn't find the reason / advantage on why to do this.

Can someone give me the current best practice on visualization of RNA-Seq aligned sequences data via IGV?

Thanks, Thomas

rna bam visualization rna-seq igv • 1.3k views

ADD COMMENT • link updated 2.3 years ago by Friederike 8.9k • written 2.3 years ago by TS ▴ 30

0

Entering edit mode

No you don't need to convert them. As long as your BAM files are co-ordinate sorted and indexed then they are good to go for visualization in IGV. IGV would need the corresponding genome to be present. If it is something custom it will need to be loaded into IGV one time at beginning.

ADD REPLY • link 2.3 years ago by GenoMax 142k

score 1 · Answer 1 · 2022-01-11

1

Entering edit mode

2.3 years ago

liorglic ★ 1.4k

You don't need to convert them, but you could. If you load your indexed bam files to IGV, you'll see each and every read alignment. If you just want to visualize the coverage along the genome, then converting to bedGraph or bigWig would make sense, as this track will be much more light than the bam track. Just do it once, load both, and you'll get the idea.

ADD COMMENT • link 2.3 years ago by liorglic ★ 1.4k

0

Entering edit mode

Seconding this statement and adding more arguments for and against a conversion to bigWig:

Pros:

the files are smaller
you can apply a normalization factor, e.g. if you have multiple samples, you may want to normalize the coverage across all samples accounting for differences in sequencing depth so that the height of the coverage "peaks" may be more representative of the actual signals

Cons:

you'll lose the information about individual splice-site supporting reads
choosing a sensible bin size (i.e. the length of the window that is used to sum up all the reads covering a given genomic locus) may not be straight-forward (and if you go with a bin size of one, the file size advantage will be strongly decreased)
you lose information about mismatches, quality scores etc. per base

For more information about going from BAM to bigWig, you can consult the deepTools documentation.

ADD REPLY • link 2.3 years ago by Friederike 8.9k