Question: Extracting coverage per base pair
0
gravatar for Burgenix
2.2 years ago by
Burgenix0
Burgenix0 wrote:

Hi,

I want to find the coverage per bp when using any seed/bait script available (MITObim, NOVOplasty). I find it wierd that even if they are aligning and baiting to a reference, this information is not readily provided.

Any ideas on how to do this?

Thanks !

coverage alignment • 1.3k views
ADD COMMENTlink modified 2.2 years ago by Samarth Kulshrestha200 • written 2.2 years ago by Burgenix0
2
gravatar for Kevin Blighe
2.2 years ago by
Kevin Blighe52k
Kevin Blighe52k wrote:

Assuming you have an aligned BAM file from either of these programs, BEDTools coverage with the -d command line parameter will output per-base coverage.

In addition, I would have expected that these programs gave detail on coverage during the de novo assembly stage of analysis.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Kevin Blighe52k

Novoplasty doesnt provide this data, and MITObim just outputs the minimum coverage in the logfile at the end of the assembly/alignment, but nothing more. I guess I'll have to find a way to use BEDtools (have no bam files, just fasta -> fasta, sadly

ADD REPLYlink written 2.2 years ago by Burgenix0

So, you just have the assembled genome in FASTA?

I can only assume that the average depth of coverage achieved was output in one of the log files (or maybe even just output to your screen) - all other programs that I've used for assembly output various metrics for coverage.

If you get desperate, you could just re-align your data to your new genome and then use BEDTools to get coverage that way. This is also common practice to see how well-represented your new genome is in relation to the samples that you used to create it.

ADD REPLYlink written 2.2 years ago by Kevin Blighe52k

Thank you so much Kevin, appreciate it ! I wont rip my hair out and just use good old BEDTools to do it. The logfiles arent that great, as they provide only the minimum coverage. MITObim has .wig files, I can maybe use those.

ADD REPLYlink written 2.2 years ago by Burgenix0

Yes, BEDTools is very powerful but, wait, if you have wig files, then you may want to take a look at bwtools. They seem to provide information on coverage. However, you first need to convert the wig files to bigWig (look here: http://genome.ucsc.edu/goldenPath/help/bigWig.html).

ADD REPLYlink written 2.2 years ago by Kevin Blighe52k
2
gravatar for Samarth Kulshrestha
2.2 years ago by
India
Samarth Kulshrestha200 wrote:

If you have BAM files then you can try:

  • samtools depth yourfile.bam

You can also provide your region of interest for per base coverage in BED format. This shall give you per base coverage.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Samarth Kulshrestha200
2

Or even faster, SamBamba depth. It is multithreaded and offers some nice custom filtering and region/binning options.

ADD REPLYlink written 2.2 years ago by ATpoint26k

Thanks ATPoint - never even knew that existed.

ADD REPLYlink written 2.2 years ago by Kevin Blighe52k
3

You're welcome. SamBamba is really a nice alternative for SAMtools when it comes to standards tasks such as sorting, indexing, flagstatting and filtering. The tools do mostly the same as their SAMtools counterparts but are more efficient in using multicore setups. Also, it indexes every created file on the fly, reducing processing time for larger files a lot.

EDIT: Be sure to specify the number of threads when using SamBamba. When not specified explicitely (-t) it will run on all available threads instead of only one, as samtools does when -@ is not set.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by ATpoint26k

I have not tried SamBamba (I plan to in the near future!) but I would like to note that as of 1.4 samtools is at least partially multithreaded and substantially faster than prior versions.

Edit: Results of testing -

bushnell@gpint209:/dev/shm$ time samtools view -@ 20 -b mapped.sam > st.bam

real    0m4.779s
user    1m7.448s
sys     0m0.805s
bushnell@gpint209:/dev/shm$ time /global/projectb/sandbox/gaag/bbtools/sambamba/sambamba view -S -f bam mapped.sam > sbb.bam

real    0m6.493s
user    2m45.629s
sys     0m1.160s
bushnell@gpint209:/dev/shm$ time samtools view -@ 20 st.bam > st.sam

real    0m4.521s
user    0m8.411s
sys     0m1.152s
bushnell@gpint209:/dev/shm$ time /global/projectb/sandbox/gaag/bbtools/sambamba/sambamba view -f sam st.bam > sbb.sam

real    0m1.842s
user    0m16.169s
sys     0m1.448s

Very impressive bam -> sam conversion speed!

-rw-rw-r-- 1 bushnell bushnell 1248860045 Oct 10 11:29 mapped.sam
-rw-rw-r-- 1 bushnell bushnell  360334358 Oct 10 11:33 sbb.bam
-rw-rw-r-- 1 bushnell bushnell 1248859507 Oct 10 11:34 sbb.sam
-rw-rw-r-- 1 bushnell bushnell  376423550 Oct 10 11:32 st.bam
-rw-rw-r-- 1 bushnell bushnell 1248859507 Oct 10 11:34 st.sam

Incidentally, SamBamba produced a smaller bam file so probably it was running at higher compression which is why it took longer for sam -> bam. Samtools does not let you adjust the compression level so there's nothing I can do about that. But SamBamba does and it looks like Samtools uses roughly compression level 4. Setting SamBamba at level 4 reduced the user and CPU time to roughly the same as samtools (both slightly higher, but the resulting file was slightly smaller).

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Brian Bushnell17k

And if I do not have .bam files? Can I calculate it using python or whatever?

ADD REPLYlink written 2.2 years ago by Burgenix0

Which file format do you have, BED?

ADD REPLYlink written 2.2 years ago by ATpoint26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 708 users visited in the last hour