Question: Finding out which genes are covered the most
0
gravatar for shubhra.bhattacharya
8 weeks ago by
shubhra.bhattacharya120 wrote:

Hey! I have performed an alignment of my reads (paired end) using Bowtie2. I would like to know which region of the genome is most highly covered.

As per my knowledge this can be done using mpileup or Bamcov or visualize the bam file in IGV. But, I want to specifically see which genes are the most highly covered and not just regions of high coverage.

Is there a way bam files can be mapped on gff files or is there any other approach to this.

Looking forward to replies!!

Thanks

coverage genes • 155 views
ADD COMMENTlink modified 8 weeks ago by Pierre Lindenbaum116k • written 8 weeks ago by shubhra.bhattacharya120

You can bin the genome (*ome) and then check of the coverage or use a gtf file to get the coverage. try bedtools coverage. In addition, one can generate histogram of coverage.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by cpad011210k
5
gravatar for Pierre Lindenbaum
8 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

I wrote BamStats05 http://lindenb.github.io/jvarkit/BamStats05.html

$ head genes.bed
1   179655424   179655582   ZORG
1   179656788   179656934   ZORG

$ java -jar  dist/bamstats05.jar -B genes.bed --mincoverage 10  in.bam > out.txt

$ head out.txt
#chrom  start   end gene    sample  length  mincov  maxcov  avg nocoverage.bp   percentcovered
1   179655424   179656934   ZORG    SAMPLE1 304 27  405 216.80921052631578  0   100
ADD COMMENTlink written 8 weeks ago by Pierre Lindenbaum116k

I get the error as below:

input: bed file CP019610.1 44 914 B0W97_00010 CP019610.1 991 1309 B0W97_00015 CP019610.1 1365 1695 B0W97_00020 CP019610.1 1691 2060 B0W97_00025 CP019610.1 2165 2465 B0W97_00030 CP019610.1 2571 3099 B0W97_00035

[SEVERE][BamStats05]bad start/end in BED line : "CP019610.1 (tab) 44 (tab) 914 (tab) B0W97_00010" java.lang.IllegalArgumentException: bad start/end in BED line : "CP019610.1 (tab) 44 (tab) 914 (tab) B0W97_00010" at com.github.lindenb.jvarkit.util.bio.bed.BedLine.<init>(BedLine.java:58) at com.github.lindenb.jvarkit.util.bio.bed.BedLineCodec.decode(BedLineCodec.java:61) at com.github.lindenb.jvarkit.tools.bamstats04.BamStats05.readBedFile(BamStats05.java:134) at com.github.lindenb.jvarkit.tools.bamstats04.BamStats05.doWork(BamStats05.java:329) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1145) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1303) at com.github.lindenb.jvarkit.tools.bamstats04.BamStats05.main(BamStats05.java:406) Caused by: java.lang.NumberFormatException: For input string: " 44 " at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:569) at java.lang.Integer.parseInt(Integer.java:615) at com.github.lindenb.jvarkit.util.bio.bed.BedLine.<init>(BedLine.java:54) ... 6 more [INFO][Launcher]bamstats05 Exited with failure (-1)

ADD REPLYlink written 8 weeks ago by shubhra.bhattacharya120

CP019610.1 (tab) 44 (tab) 914 (tab) B0W97_00010

va.lang.NumberFormatException: For input string: " 44 " at

I see some whitespaces "(space)(tab)(space)" between each column. There should be only one "(tab)".

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Pierre Lindenbaum116k

BTW , while I was here I've just added a new option '--merge' for my tool : https://github.com/lindenb/jvarkit/commit/1be40fd3e7aac9dadcfbdb4fb079027ad0b3581a ( with an error, fixed later...)

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Pierre Lindenbaum116k

works now! Thanks and I will get back to you in case of doubts.

ADD REPLYlink written 8 weeks ago by shubhra.bhattacharya120

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 8 weeks ago by Pierre Lindenbaum116k

Hi! Can you help me with understanding no_coverage column means? It would be really great if you tell what the column header mean for each field. I know it's intuitive but it would be helpful.

ADD REPLYlink written 7 weeks ago by shubhra.bhattacharya120

no_coverage

number of bases in the interval that are below the limit (default <=0)

ADD REPLYlink written 7 weeks ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1158 users visited in the last hour