How to get coverage for an alignment file
2
0
Entering edit mode
8.4 years ago
biotech ▴ 570

hi good morning biostars I got in trouble again accepting a work that I have never performed. seems It's not that difficult so I will propose here my approximation and then you can tell me if what I'm doing is right or we can have better approximation right. I need to get coverage of whole genome from an alignment file. I really don't know how to do that possibly using bedtools. please let me know if there's an alternative to have coverage for the whole genome. thank you very much have a nice day

Bam • 4.1k views
ADD COMMENT
3
Entering edit mode
8.4 years ago
agata88 ▴ 870

You can use bedtools coverage -abam <bam file> -b <bed file/gff file> -d > coverage.txt

Then you will have a coverage for every nucleotide. Because the file will be big for whole genome I suggest to write a python script for calculating of average coverage for whole genome, or coverage per gene etc. I also suggest using bed file from UCSC for whole genes.

Hope it helps,

Best,

Agata

ADD COMMENT
0
Entering edit mode

Hi Agata good morning you are so helpful. In fact I just need to get coverage for each nucleotide I don't need the coverage for each gene for now. From the command you gave me seems I need a GFF file. Do you think we can skip that annotation file?

ADD REPLY
0
Entering edit mode

It is not an annotation file but defined regions. I think that a coverage file for full genome would be something like 2 Gb of data?

ADD REPLY
0
Entering edit mode

Hi thanks for your quick reply. I'm working with small bacteria is about 5 MB so things are going to be tiny. If I remember well the gff file has been produced using bedtools right. Anyway I think I should read in more detail the manual of bedtools.

ADD REPLY
0
Entering edit mode

I think you can write your own bed file including coordinates start and stop of your reference. What bacteria do you have?

ADD REPLY
0
Entering edit mode

We assembled the genome in house so it's not published. Do you know what I need to generate this GFF files for bedtools? How do people generate this file?

ADD REPLY
0
Entering edit mode

I think you can calculate how much nucleotides you have and then just write a bed file included:

1(start) /t XXX (end)

I am not sure it is going to work I have never done something like that but it is worth to try :) I would try that first.

ADD REPLY
0
Entering edit mode

Where are you taking this gff files from? You are so lucky you have them, maybe because you work with a model organism.

ADD REPLY
1
Entering edit mode

You need to write bed/gff file on your own :)

Here is an example how should bed file looks like:

How Can I Make A Bed File?

ADD REPLY
0
Entering edit mode

Have a look a this one, comming from QUAST GenMark tool. May be enough? Thanks https://drive.google.com/file/d/0B8-ZAuZe8jldTDd2RTd2REZxclU/view?usp=sharing

ADD REPLY
1
Entering edit mode

If that file does not work cut columns 1,4,5 and that would convert it to a bed format.

This file will not give you coverage for every nucleotide for the entire genome (just for the intervals defined in your file).

If you need the entire genome I wonder if you can get away with saying

Scaffold_1 1 End_nucleotide_#

which will give you coverage for every nucleotide in the scaffold.

ADD REPLY
0
Entering edit mode

Hi genomax2. Thanks for having a look at my file. I have 31 contigs in my genome. So doing that will do the trick for all the contigs right thanks so much

ADD REPLY
0
Entering edit mode

Hi Agata, do you think we are talking about gff3 or basic gff. If I remember well there are some differences.

ADD REPLY
2
Entering edit mode
8.4 years ago
rlbc ▴ 20

There is also qualimap, which gives you whole genome and per chromosome/contig coverage.

ADD COMMENT

Login before adding your answer.

Traffic: 1457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6