BAM File Export Coverage Values from IGV or using PySam/Samtools
2
0
Entering edit mode
6.7 years ago
EOX123 ▴ 10

Would anyone please be able to help me with the following, I am new to sequencing analysis:

Input Data: .bam file with thousands of reads .fasta file with my reference

When I align my .bam file reads to my reference using IGV I get a coverage track displaying the depth of the reads at each locus as a grey bar, and coloured bars with the proportion of each base (A, C, G, T), insertions and deletions, at "low quality" read sites. These are the values I am interested in.

What I would like now, would be to export these values so I can work in Python. How may I achieve this?

Or is there a more direct way of calculating the same values directly in python with say pysam that I have not found yet?

In IGV you can export the counts using IGVtools to .tdf or even .wig, but I am restricted to IGV to read them afterwards, and would prefer a more direct method.

IGV alignment sequencing samtools bam • 4.3k views
ADD COMMENT
0
Entering edit mode

What is your end goal with this?

ADD REPLY
0
Entering edit mode

To have a file that gives me at each position the percent reads for each base (ACTGN), insertion, deletions but I just found out that the IGVtools command line does exactly what I want!

ADD REPLY
0
Entering edit mode

That's not an end goal. What do you actually want to do with that? I imagine your end goal isn't to start at lists of "percent A, percent C, ..." for each position all day.

ADD REPLY
0
Entering edit mode

Could you kindly post your solution here? I also need that values.

ADD REPLY
0
Entering edit mode
6.7 years ago
Zhixue ▴ 10

It seem that pysam can get the coverage values(as well as the proportion of each base (A, C, G, T), insertions and deletions), but it's not the best choice if there are some more convenient python packages.

for example:

SRR3239808.3377 16 chr2 812 60 13S29M * 0 0 CCGCGCCACCACCGCGGACTCCGCTCCCCGGCCGGGGCCGCC E/AEE///A/

This is a single-end read.

After pysam open a bam/sam file, this read1 is a < pysam.libcalignedsegment.AlignedSegment object >

  1. get the read's SEQ .
    read1.query_sequence
    'CCGCGCCACCACCGCGGACTCCGCTCCCCGGCCGGGGCCGCC'

  2. get the reference(which reads mapped)'s seq
    read1.get_reference_sequence()
    'aCaGcCTgtcCaCCgCGGCCGGGGCCGCC'

  3. get the list of aligned read (query) and reference positions
    read1.get_aligned_pairs(with_seq=True)#this seq is the reference seq
    [(0, None, None), (1, None, None), (2, None, None), (3, None, None), (4, None, None), (5, None, None), (6, None, None), (7, None, None), (8, None, None), (9, None, None), (10, None, None), (11, None, None), (12, None, None), (13, 811, 'a'), (14, 812, 'C'), (15, 813, 'a'), (16, 814, 'G'), (17, 815, 'c'), (18, 816, 'C'), (19, 817, 'T'), (20, 818, 'g'), (21, 819, 't'), (22, 820, 'c'), (23, 821, 'C'), (24, 822, 'a'), (25, 823, 'C'), (26, 824, 'C'), (27, 825, 'g'), (28, 826, 'C'), (29, 827, 'G'), (30, 828, 'G'), (31, 829, 'C'), (32, 830, 'C'), (33, 831, 'G'), (34, 832, 'G'), (35, 833, 'G'), (36, 834, 'G'), (37, 835, 'C'), (38, 836, 'C'), (39, 837, 'G'), (40, 838, 'C'), (41, 839, 'C')]

  4. get the mapped reference postions
    read1.get_reference_positions()
    [811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839]

More information in http://pysam.readthedocs.io/en/latest/api.html


So after counting this read, the result may like this:

reference postion 811: ['A':0,'G':1,'C':0,'T':0] #because read1.query_sequence[13] is G
reference postion 812: ['A':0,'G':0,'C':1,'T':0] #because read1.query_sequence[14] is C
......
insert or deletion: from which pos to which pos
you can store with a list of tuple from start to end such as [(5,8),(34,40)]

If nobody has built the python packege to get the coverage values, you can try pysam.

ADD COMMENT
0
Entering edit mode

Thank you for this! I found out that IGVtools command line can do it but this might come to use.

ADD REPLY
0
Entering edit mode
6.4 years ago

Can you post solution i am currently trying to export my data in a similar way, and am unfamiliar with the works of IGVtools

Thanks

ADD COMMENT
1
Entering edit mode

BamDeal can calculate depth of four bases in every locus of reference genome. It is a full featured software for operating bam files. Other functions may facilitate your research. https://github.com/BGI-shenzhen/BamDeal

ADD REPLY
0
Entering edit mode

yes. ./BamDeal-0.21/bin/BamDeal_Linux statistics BasesCount

./BamDeal-0.21/bin/BamDeal_Linux   statistics  BasesCount

        Usage: BasesCount  -List  <bam.list>  -OutPut  <outfix>

                -InList    <str>     Input Bam/Sam File List
                -InFile    <str>     Input Bam/Sam File File[repeat]
                -OutPut    <str>     OutPut File prefix

                -Bed       <str>     Stat Coverage,MeanDepth for bed Regions 
                -MinQ      <int>     Ignore too low mapQ read[10]

                -help                Show this help [hewm2008]

if for [-bed ] file coordinate is start form 1, and this coordinate system will be updated in the next version to make it consistent with samtools starting from 0.

if you want the mapQ==0 to count the depth, you can set the [-MinQ -1 ]

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6