Question: How can we extract the alignment coordinates and gap coordinates in BLAT PSL?
1
gravatar for Rohit
5.4 years ago by
Rohit1.4k
California
Rohit1.4k wrote:

Hello,

I am bringing up the thread regarding the coordinates from BLAT psl-format, previous threads covered regarding in PSL format but I want to include the negative strand the the gap-blocks information too.

 

blat sequence alignment • 1.9k views
ADD COMMENTlink modified 3 months ago by anatoly.developer0 • written 5.4 years ago by Rohit1.4k
0
gravatar for kepbod
5.4 years ago by
kepbod90
China
kepbod90 wrote:

I think if you want to extract the alignment sequences, you could directly add "-out=pslx" when you run blat.

ADD COMMENTlink written 5.4 years ago by kepbod90

Sorry I got back late, there is an extended option to include the sequences or alignments completely from pslx gives the output with the sequences, it does not give you blocks of alignments or missing gaps :(

ADD REPLYlink written 5.4 years ago by Rohit1.4k
0
gravatar for anatoly.developer
3 months ago by
anatoly.developer0 wrote:

I know that's an old question, but anyway: the default .psl output format of BLAT includes columns block count, blockSizes, qStarts and tStarts which contain gaps coordinates.
The following R snippet will help to extract all exact matches from BLAT output into a GRangesList object respecting the gaps:

  library(data.table)
  library(GenomicRanges)
  library(rtracklayer)
  library(stringr)

  # skip first 5 lines of header
  blat_out <- fread("blat_output_file.psl", skip = 5)
  setnames(blat_out, c("V1", "V9", "V10", "V11", "V14", "V16", "V17", "V18", "V19", "V20", "V21"), c("match", "strand", "name", "len", "chr", "start", "end", "blockCount", "blockSizes", "blockStarts", "tStarts"))

  # keep only exact matches
  blat_out <- blat_out[match==len]

  # remove any match on the "alternative" chromosomes
  blat_out <- blat_out[!str_detect(chr, "alt")]
  blat_out <- blat_out[,.(name, strand, chr, start, end, len, blockCount, blockSizes, blockStarts, tStarts)]

  # to make "name" column unique
  blat_out[,name:=paste0(name, "_", 1:nrow(.SD)),by=name]

  blat_out_blocks_granges <- GRangesList(lapply(blat_out$name, function(name_i){
    row <- blat_out[name==name_i]
    block_sizes <- as.integer(unlist(str_split(row$blockSizes, ",")))
    block_sizes <- block_sizes[!is.na(block_sizes)]

    t_starts <- as.integer(unlist(str_split(row$tStarts, ",")))
    t_starts <- t_starts[!is.na(t_starts)]

    result <- data.table(start=t_starts, end=t_starts+block_sizes)
    makeGRangesFromDataFrame(cbind(row[,.(name, strand, chr)], result), keep.extra.columns = T)
  }))

  names(blat_out_blocks_granges) <- blat_out$name

  # and export to a .bed file
  export(blat_out_blocks_granges, con = "output.bed", format = "bed")
ADD COMMENTlink modified 3 months ago • written 3 months ago by anatoly.developer0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1488 users visited in the last hour