Nucleotide coverage if >5 reads
1
0
Entering edit mode
6.5 years ago
Vca80553 • 0

Hi Everyone,

I have generated BAM files after mapping my Illumina reads to a reference genome. Now I want to know how much of the reference genome is covered (aligned/mapped) by reads (e.g. 98% of the reference genome is covered by reads). For that I used Bbmap/pileup.sh. It works fast and it is accurate.

However, the 98% I get is the % of nucleotides covered by at least 1 read. It it possible to modify this, and get % of nucleotides that were covered at least with 5 reads?

Thanx,

Sara

alignment sequence pileup • 2.9k views
ADD COMMENT
1
Entering edit mode

Take a look at the pileup.sh command options I have compiled in this thread (search for name pileup) to see if one of those options helps.

ADD REPLY
1
Entering edit mode

Hello Vca80553!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=78704

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Yes, sorry. I didn't know this. My fault. Will not happen again. Thanx.

ADD REPLY
0
Entering edit mode

For pileup.sh

Output format (tab-delimited):
ID, Avg_fold, Length, Ref_GC, Covered_percent, Covered_bases, Plus_reads, Minus_reads, Read_GC, Median_fold, Std_Dev

Plus_reads:        Number of reads mapped to plus strand
Minus_reads:       Number of reads mapped to minus strand

Perhaps you can add those two numbers and filter on that to get the info you need.

ADD REPLY
0
Entering edit mode

Tagging: Brian Bushnell

ADD REPLY
0
Entering edit mode

Hi,

Thanx for answering.

I am not sure I understand your answer. (I am pretty new with this, sorry)

The output format gives me something like this for each sample:

ID, Avg_fold, Length, Ref_GC, Covered_percent, Covered_bases, Plus_reads, Minus_reads, Read_GC, Median_fold, Std_Dev
S1588, 24909.1657, 7904, 100.0000, 98.7854, 7808, 1105713, 1105713, 6112, 20615.50

The Plus reads and Minus reads give me the total number of mapped reads to the whole reference genome. How can I filter from there?

ADD REPLY
1
Entering edit mode

Sorry. You are right. For some reason I was thinking that the output of pileup.sh is per base.

ADD REPLY
2
Entering edit mode
6.5 years ago

You can use BEDtools genomecov command:

bedtools genomecov -ibam FILE.BAM -d | awk '$4 > 4' | wc -l

Divide the output by the size of the genome to get the percentage.

ADD COMMENT

Login before adding your answer.

Traffic: 3219 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6