Tool: VisualMSI: detect and visualize microsatellite instability(MSI) from NGS data
1
gravatar for chen
13 months ago by
chen1.9k
OpenGene
chen1.9k wrote:

This is a new OpenGene project: https://github.com/OpenGene/VisualMSI

VisualMSI

VisualMSI is a tool to detect and visualize microsatellite status from NGS data, by simulating the PCR behavior. VisualMSI extracts the PCR adapters from the reference genome, and tries to map them to the sequencing reads. If the adapters are successfully mapped to a read/pair, its inserted length enbraced by the adapter will be calculated. VisualMSI performs statistics based on the inserted length distribution. This method is very similar as the PCR-based MSI detection method, which is usually considered as the golden standard method for clinical usage.

For each MSI target locus, VisualMSI computes the information entropy of its inserted length distribution. The information entropy value is a indicator for the MSI status, the higher the information entropy is, the higher the probility that this MSI locus is instable.

VisualMSI can run in tumor-normal paired mode or case-only mode, and the tumor-normal mode is suggested if the paired normal sample is available. If the paired normal sample is given, VisualMSI will evaluate the earth mover's distance (EMD) between the distributions of tumor data or normal data. Since the normal data is usually considered as MSI-stable, the EMD value indicates how instable the tumor data is when comparing to the normal data. The higher the EMD value is, the higher probility that this MSI locus is instable.

Take a quick glance of the informative report

A quick example

  • Tumor-normal paired mode:

    visualmsi -i tumor.sorted.bam -n normal.sorted.bam -r hg19.fasta -t targets/msi.bed

  • Case-only mode (no paired normal data given):

    visualmsi -i tumor.sorted.bam -r hg19.fasta -t targets/msi.bed

Get visualmsi program

download binary

This binary is only for Linux systems, http://opengene.org/VisualMSI/visualmsi

# this binary was compiled on CentOS, and tested on CentOS/Ubuntu
wget http://opengene.org/VisualMSI/visualmsi
chmod a+x ./visualmsi

or compile from source

# step 1: download and compile htslib from: https://github.com/samtools/htslib
# step 2: get VisualMSI source (you can also use browser to download from master or releases)
git clone https://github.com/OpenGene/VisualMSI.git

# build
cd VisualMSI
make

# Install
sudo make install

Usage

You should provide following arguments to run visualmsi * the reference genome fasta file, specified by -r or --ref= * the target setting file, specified by -t or --target= * the input BAM file, specified by -i or --in=. If the normal data is available, specify it by -n or --normal= * the plain text result is directly printed to STDOUT, you can pipe it to a file using a >

Reference genome

The reference genome should be a single whole FASTA file containg all chromosome data. This file shouldn't be compressed. For human data, typicall hg19/GRch37 or hg38/GRch38 assembly is used, which can be downloaded from following sites:

  1. hg19/GRch37: ftp://ftp.ncbi.nlm.nih.gov/sra/reports/Assembly/GRCh37-HG19_Broad_variant/Homo_sapiens_assembly19.fasta
  2. hg38/GRch38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz Remember to decompress hg38.fa.gz since it is gzipped and is not supported currently.

Target file

The target file is a bed file giving the MSI locuses. VisualMSI will compute the target MSI locus using the center between the start and the end position. To add a MSI target locus at chr:pos, you can add a row with values (chr, pos-100, pos+100, name). You can see the example from targets/msi.bed:

chr4    55588112    55608311    BAT25
chr2    47631460    47651659    BAT26
chr14   23642268    23662467    NR-21
chr11   102183409   102203608   NR-27
chr2    95839262    95859461    NR-24

Please note that this bed file is based on hg19 coordination.

Reports

VisualMSI reports results in HTML/JSON/TEXT formats, you can take a look at following examples:

  1. Sample HTML report: http://opengene.org/VisualMSI/msi.html
  2. Sample JSON report: http://opengene.org/VisualMSI/msi.json
  3. Sample TEXT report: http://opengene.org/VisualMSI/msi.txt

Tumor-normal paired mode

image  

For each MSI locus, the entropy values of tumor and normal data are shown, as well as the earth mover's distance (EMD) value.

Case-only mode

image  

For each MSI locus, only the entropy value of tumor data is shown.

All options

options:
  -i, --in                     input sorted bam/sam file for the case (tumor) sample. STDIN will be read from if it's not specified (string [=-])
  -n, --normal                 input sorted bam/sam file for the paired normal sample (tumor-normal mode). If not specified, VisualMSI will run in case-only mode. (string [=])
  -t, --target                 the bed file (chr, start, end, name) to give the MSI targets (string)
  -r, --ref                    reference fasta file name (should be an uncompressed .fa/.fasta file) (string)

  # options for setting thresholds
  -a, --adapter_len            set the length of the adapter for PCR simulation (5~30). Default 12 means the left and right adapter both have 12 bp. (int [=12])
  -l, --target_inserted_len    set the distance on reference of the two adapters for PCR simulation (20~200). Default 100 means: <left adapter><100 bp inserted><right adapter> (int [=100])
  -d, --depth_req              set the minimum depth requirement for each MSI locus (1~1000). Default 10 means 10 supporting reads/pairs are required. (int [=10])

  # options for specifying the file names of the reports
  -j, --json                   the json format report file name (string [=msi.json])
  -h, --html                   the html format report file name (string [=msi.html])

  # other options
      --debug                  output some debug information to STDERR.
  -?, --help                   print this message
ADD COMMENTlink written 13 months ago by chen1.9k

Hi Chen, I tried downloading/saving VisualMSI using the steps given above, but when I test the tool in command line, it says visualmsi: command not found

Could you please help me on this?

TIA

ADD REPLYlink written 7 months ago by sruthi20

Hey. It is mentioned that the target file is based on hg19 coordination. How can I generate one for hg38 coordination? TIA

ADD REPLYlink written 7 months ago by sruthi20

You can make your own file if you want to process hg38 bam.

ADD REPLYlink written 7 months ago by chen1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1836 users visited in the last hour