Question: BAM File Export Coverage Values from IGV or using PySam/Samtools
0
gravatar for EOX123
11 months ago by
EOX12310
EOX12310 wrote:

Would anyone please be able to help me with the following, I am new to sequencing analysis:

Input Data: .bam file with thousands of reads .fasta file with my reference

When I align my .bam file reads to my reference using IGV I get a coverage track displaying the depth of the reads at each locus as a grey bar, and coloured bars with the proportion of each base (A, C, G, T), insertions and deletions, at "low quality" read sites. These are the values I am interested in.

What I would like now, would be to export these values so I can work in Python. How may I achieve this?

Or is there a more direct way of calculating the same values directly in python with say pysam that I have not found yet?

In IGV you can export the counts using IGVtools to .tdf or even .wig, but I am restricted to IGV to read them afterwards, and would prefer a more direct method.

ADD COMMENTlink modified 7 months ago by christophersemail400 • written 11 months ago by EOX12310

What is your end goal with this?

ADD REPLYlink written 11 months ago by Devon Ryan88k

To have a file that gives me at each position the percent reads for each base (ACTGN), insertion, deletions but I just found out that the IGVtools command line does exactly what I want!

ADD REPLYlink modified 11 months ago • written 11 months ago by EOX12310

That's not an end goal. What do you actually want to do with that? I imagine your end goal isn't to start at lists of "percent A, percent C, ..." for each position all day.

ADD REPLYlink written 11 months ago by Devon Ryan88k

Could you kindly post your solution here? I also need that values.

ADD REPLYlink written 3 months ago by xzhangxmu0
0
gravatar for Zhixue
11 months ago by
Zhixue10
China/Shanghai/TJ
Zhixue10 wrote:

It seem that pysam can get the coverage values(as well as the proportion of each base (A, C, G, T), insertions and deletions), but it's not the best choice if there are some more convenient python packages.

for example:

SRR3239808.3377 16 chr2 812 60 13S29M * 0 0 CCGCGCCACCACCGCGGACTCCGCTCCCCGGCCGGGGCCGCC E/AEE///A/

This is a single-end read.

After pysam open a bam/sam file, this read1 is a < pysam.libcalignedsegment.AlignedSegment object >

  1. get the read's SEQ .
    read1.query_sequence
    'CCGCGCCACCACCGCGGACTCCGCTCCCCGGCCGGGGCCGCC'

  2. get the reference(which reads mapped)'s seq
    read1.get_reference_sequence()
    'aCaGcCTgtcCaCCgCGGCCGGGGCCGCC'

  3. get the list of aligned read (query) and reference positions
    read1.get_aligned_pairs(with_seq=True)#this seq is the reference seq
    [(0, None, None), (1, None, None), (2, None, None), (3, None, None), (4, None, None), (5, None, None), (6, None, None), (7, None, None), (8, None, None), (9, None, None), (10, None, None), (11, None, None), (12, None, None), (13, 811, 'a'), (14, 812, 'C'), (15, 813, 'a'), (16, 814, 'G'), (17, 815, 'c'), (18, 816, 'C'), (19, 817, 'T'), (20, 818, 'g'), (21, 819, 't'), (22, 820, 'c'), (23, 821, 'C'), (24, 822, 'a'), (25, 823, 'C'), (26, 824, 'C'), (27, 825, 'g'), (28, 826, 'C'), (29, 827, 'G'), (30, 828, 'G'), (31, 829, 'C'), (32, 830, 'C'), (33, 831, 'G'), (34, 832, 'G'), (35, 833, 'G'), (36, 834, 'G'), (37, 835, 'C'), (38, 836, 'C'), (39, 837, 'G'), (40, 838, 'C'), (41, 839, 'C')]

  4. get the mapped reference postions
    read1.get_reference_positions()
    [811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839]

More information in http://pysam.readthedocs.io/en/latest/api.html


So after counting this read, the result may like this:

reference postion 811: ['A':0,'G':1,'C':0,'T':0] #because read1.query_sequence[13] is G
reference postion 812: ['A':0,'G':0,'C':1,'T':0] #because read1.query_sequence[14] is C
......
insert or deletion: from which pos to which pos
you can store with a list of tuple from start to end such as [(5,8),(34,40)]

If nobody has built the python packege to get the coverage values, you can try pysam.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Zhixue10

Thank you for this! I found out that IGVtools command line can do it but this might come to use.

ADD REPLYlink written 11 months ago by EOX12310
0
gravatar for christophersemail40
7 months ago by
christophersemail400 wrote:

Can you post solution i am currently trying to export my data in a similar way, and am unfamiliar with the works of IGVtools

Thanks

ADD COMMENTlink written 7 months ago by christophersemail400
1

BamDeal can calculate depth of four bases in every locus of reference genome. It is a full featured software for operating bam files. Other functions may facilitate your research. https://github.com/BGI-shenzhen/BamDeal

ADD REPLYlink modified 7 days ago • written 7 days ago by jiaobingke20

yes. ./BamDeal-0.21/bin/BamDeal_Linux statistics BasesCount

./BamDeal-0.21/bin/BamDeal_Linux   statistics  BasesCount

        Usage: BasesCount  -List  <bam.list>  -OutPut  <outfix>

                -InList    <str>     Input Bam/Sam File List
                -InFile    <str>     Input Bam/Sam File File[repeat]
                -OutPut    <str>     OutPut File prefix

                -Bed       <str>     Stat Coverage,MeanDepth for bed Regions 
                -MinQ      <int>     Ignore too low mapQ read[10]

                -help                Show this help [hewm2008]

if for [-bed ] file coordinate is start form 1, and this coordinate system will be updated in the next version to make it consistent with samtools starting from 0.

if you want the mapQ==0 to count the depth, you can set the [-MinQ -1 ]

ADD REPLYlink written 7 days ago by hewm200820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2181 users visited in the last hour