Question

Percent identity plot

0

Entering edit mode

5.7 years ago

tarek.mohamed ▴ 360

Dear All

What are the available tools for plotting sequence reads percent identitiy over a certain genetic locus in multiple samples.

I have Sam/bam files for some samples and I want to compare and plot percent identity for one gene across all samples

precent identitiy alignment plot • 2.0k views

ADD COMMENT • link updated 5.7 years ago by h.mon 35k • written 5.7 years ago by tarek.mohamed ▴ 360

0

Entering edit mode

Do you have a citation for what you are aiming to do? Can you elaborate on what you mean by 'percent identity'?

Using BLAST, you should be able to determine the percentage of a particular gene that is covered by your input reads. For example, blastx can taken RNA-seq FASTA reads and perform alignment to mRNA transcripts fr the purposes of identifying genes covered by your reads.

ADD REPLY • link 5.7 years ago by Kevin Blighe 87k

0

Entering edit mode

tarek.mohamed : You say that you have SAM/BAM files for the samples so one way to do this would be to generate consensus sequence across the gene boundaries and then do dot plots with those sequences. That should give you an idea of percent identity.

This recent package (FlexiDot: highly customizable, ambiguity-aware dotplots ) may be of interest.

ADD REPLY • link 5.7 years ago by GenoMax 141k

score 0 · Answer 1 · 2018-08-06

reformat.sh ( from the BBTools / BBMap package) has a number of metrics of interest for you:

Histograms for sam files only (requires sam format 1.4 or higher):

ehist=<file>            Errors-per-read histogram.
qahist=<file>           Quality accuracy histogram of error rates versus quality score.
indelhist=<file>        Indel length histogram.
mhist=<file>            Histogram of match, sub, del, and ins rates by read location.
ihist=<file>            Insert size histograms.  Requires paired reads interleaved in sam file.
idhist=<file>           Histogram of read count versus percent identity.
idbins=100              Number idhist bins.  Set to 'auto' to use read length.