Question: DataFormat: Methylation : qmap bis-seq format To bigwig/wig/bedgraph
0
gravatar for Shicheng Guo
2.4 years ago by
Shicheng Guo7.4k
Shicheng Guo7.4k wrote:

Hi All,

I downloaded the data from GEO (GSE17972). It seems the data format is quite special one: qmap. Anyone have the script to transfer qmap bis-seq format To bigwig/wig/bedgraph.

GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17972 OR the question is same as: how to understand the qmap bis-seq format?

*Please ignore all the answer below. Since the reads were aligned to YH Genome, rather than GRCh37/38. I suggestion all the user who try to use this data should download the fastq and re-mappint to human genome, so that the result could be comparable with other dataset.*

* Download Finished, Alignment Finished, BW finished *

Okay. After all the guys' help. This problem has been totally solved.

1, this dataset is hg18 based.

2, qmap format has been listed by Devon Ryan..

3, in the qmap file, every base have been shown not only CpG, but every base.

4, therefore, when you calculate methylation level of CpG, you need extract the value for CpG sites.

Thanks.

enter image description here

bigwig qmap • 1.1k views
ADD COMMENTlink modified 2.2 years ago • written 2.4 years ago by Shicheng Guo7.4k
1

ENA appears to have the fastq files in case you want to get the original data.

ADD REPLYlink written 2.4 years ago by genomax63k

Yes. Thanks. I notice it: http://www.ebi.ac.uk/ena/data/view/PRJNA119949 However, I don't want to download the huge fastq data again. I think I can trust the mapping from BGI, right?

ADD REPLYlink written 2.4 years ago by Shicheng Guo7.4k
1

Long as you can understand the format, which you say is special :)

ADD REPLYlink written 2.4 years ago by genomax63k

Okay. let change to "how to understand the qmap bis-seq format". Thanks genomax2! by the way, who is genomax1?

ADD REPLYlink written 2.4 years ago by Shicheng Guo7.4k
2
gravatar for Devon Ryan
2.4 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

Convert each chromosome to a bedGraph file:

#!/usr/bin/env python
f = open("GSE17972_HUMtg5lib.qmap.XXXXXX.txt") # Change me!
line = f.next().strip()
chrom = line[1:]
of = open("{}.bedGraph".format(chrom), "w")
pos = 0
for line in f:
    cols = line.split()
    M = int(cols[1])
    UM = int(cols[2])
    if M + UM > 0:
        of.write("{}\t{}\t{}\t{}\n".format(chrom, pos, pos + 1, 100. * float(M) / (M + UM)))
    pos += 1
of.close()
f.close()

Concatenate the chromosomes and then convert to a bigWig if you need to.

BTW, either make a coverage track or filter this for sites with at least 5x or 10x coverage.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Devon Ryan88k

Devon is awesome. I know, I know, I know you can solve all the problems. Thank you so much! @Devon Ryan

ADD REPLYlink written 2.4 years ago by Shicheng Guo7.4k

Hi Devon, Where can I find the detail information of qmap? I mean the meaning for each column? Thanks.

ADD REPLYlink written 2.4 years ago by Shicheng Guo7.4k
1

From the first post on this SeqAnswers thread (pasted here for reference).

"HumMethy.qmap.chr" files:

1)name:
    oligonucleotide name

2)Methy_Uniq:
    Tags mapped uniquely to the reference, suggesting this site to be methylated

3)Unmethy_Uniq:
    Tags mapped uniquely to the reference, suggesting this site to be unmethylated

4)No_methy_info_Uniq:
    Tags mapped uniquely to the reference, not giving any infomation about methylation on this site

5)Mismatch_Uniq:
    Tags mapped uniquely to the reference, mismatching on this site

6)Methy_Rep:
    Tags mapped repeatedly to the reference, suggesting this site to be methylated

7)Unmethy_Rep:
    Tags mapped repeatedly to the reference, suggesting this site to be unmethylated

8)No_methy_info_Rep:
    Tags mapped repeatedly to the reference, not giving any infomation about methylation on this site

9)Mismatch_Rep:
    Tags mapped repeatedly to the reference, mismatching on this site

10)Total_Rep:
    Sum of tags mapped repeatedly to the reference on this site

11)Rep_rate:
    Sum of tags mapped to the reference on this site(Column 2-5 & 10)/ Sum of tags used to the reference on this site(Column 2-9)

12)Qual_Methy_Uniq:
    Quality of column 2

13)Qual_Unmethy_Uniq:
    Quality of column 3
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax63k

Nice, @genomax2, This question can be closed after you give us this reference! Thanks!

ADD REPLYlink written 2.4 years ago by Shicheng Guo7.4k

I found it in GEO link as the last supplemental file :)

ADD REPLYlink written 2.4 years ago by Devon Ryan88k
0
gravatar for Shicheng Guo
2.4 years ago by
Shicheng Guo7.4k
Shicheng Guo7.4k wrote:

# qmap2bedgraph.pl

#!/usr/bin/perl
# USAGE: perl qmap2bedgraph.pl GSE17972_HUMtg5lib.qmap.chrY.txt.gz > GSE17972_HUMtg5lib.qmap.chrY.bedgraph
use strict;
my $f=shift @ARGV;
my $pos;
my $chr;
open F,$f;
if($f=~/(chr\w+)/){
$chr=$1;
}
while(<F>){
chomp;
my @line=split/\t/;
my $M=$line[1];
my $UM=$line[2];
if(($M+$UM)>=5){
my $mf=sprintf("%.3f",$M/($M+$UM));
my $start=$pos;
my $end=$pos+1;
print "$chr\t$start\t$end\t$mf\n";
}
$pos++;
}
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Shicheng Guo7.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1871 users visited in the last hour