Other then picard better way to calculate insertmetrics from a bam file ?
2
0
Entering edit mode
6.1 years ago
pinn ▴ 210

Hi,

I tried with picard, I got the distributions is their any tool or AWK command I can directly extract from a sam/bam file ?

thanks!

alignment next-gen assembly genome • 2.5k views
ADD COMMENT
0
Entering edit mode

Not sure what kind of metrics you are looking for but reformat.sh from BBMap suite has a bunch of options for metrics.

Histogram output parameters:

bhist=<file>            Base composition histogram by position.
qhist=<file>            Quality histogram by position.
qchist=<file>           Count of bases with each quality value.
aqhist=<file>           Histogram of average read quality.
bqhist=<file>           Quality histogram designed for box plots.
lhist=<file>            Read length histogram.
gchist=<file>           Read GC content histogram.
gcbins=100              Number gchist bins.  Set to 'auto' to use read length.
gcplot=f                Add a graphical representation to the gchist.
maxhistlen=6000         Set an upper bound for histogram lengths; higher uses more memory.
                        The default is 6000 for some histograms and 80000 for others.

Histograms for sam files only (requires sam format 1.4 or higher):

ehist=<file>            Errors-per-read histogram.
qahist=<file>           Quality accuracy histogram of error rates versus quality score.
indelhist=<file>        Indel length histogram.
mhist=<file>            Histogram of match, sub, del, and ins rates by read location.
ihist=<file>            Insert size histograms.  Requires paired reads interleaved in sam file.
idhist=<file>           Histogram of read count versus percent identity.
idbins=100              Number idhist bins.  Set to 'auto' to use read length.
ADD REPLY
0
Entering edit mode

I'm not able to generate any histogram for insertsizes ?

**##CMD##**
./reformat.sh -Xmx10g in1=/data/SRR_1.fastq in2=/data/SRR_2.fastq  ihist=/data/SRR.sam  out=SRR.hist  
Set insert size histogram output to /data/SRR.sam

Set INTERLEAVED to false
Unspecified format for output SRR.hist; defaulting to fastq.
Input is being processed as paired
Writing interleaved.
Input:                      119852838 reads             17738220024 bases
Output:                     119852838 reads (100.00%)   17738220024 bases (100.00%)

Time:                           136.518 seconds.
Reads Processed:        119m    877.93k reads/sec
Bases Processed:      17738m    129.93m bases/sec
ADD REPLY
0
Entering edit mode

That options requires input to be in interleaved SAM format (it does not work with raw reads).

If you are interested in getting insert sizes then you can do that by a couple of different ways using BBMap. Those options are noted in this post.

ADD REPLY
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

I able to retrieve the insert-metrics while comparison with other tools like picard its showing less insert-size. Picard shows very good distribution with read pairs information.

ADD REPLY
0
Entering edit mode

I don't know about other tools but since BBMap is using actual alignments (or read overlap) the stats generated should be accurate.

ADD REPLY
0
Entering edit mode
6.1 years ago
Tm ★ 1.1k

You can check insert size distribution using qualimap which is very easy to use and takes sorted bam/sam as input. It gives you complete mapping statistics along with insert size and its standard deviation. Alternatively, you can use bamstats which also gives detailed mapping stats.

ADD COMMENT
0
Entering edit mode

using bamstats I'm getting Mapping quality, readlength, editdistances, coverage depth, start positions I'm not able find insert metrics.

ADD REPLY
0
Entering edit mode

Suggest you to use Qualimap, it gives graphical representation for insert size calculated.

ADD REPLY
0
Entering edit mode
6.1 years ago
trausch ★ 1.9k

Another option is Alfred. We provide example quality control files for different sequencing assays (DNA-Seq whole-exome, ATAC-Seq, ...) and different sequencing technologies (Illumina, ONT, PacBio) at the companion web application so you can have a look if that's what you are looking for (QC files include the insert size).

ADD COMMENT
0
Entering edit mode

I tried with bamstats and alfred.

ADD REPLY
0
Entering edit mode

Thanks, did alfred work?

ADD REPLY
0
Entering edit mode

Yes, I'm getting good distribution only with picard.

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6