DataFormat: Methylation : qmap bis-seq format To bigwig/wig/bedgraph
2
0
Entering edit mode
7.5 years ago
Shicheng Guo ★ 9.4k

Hi All,

I downloaded the data from GEO (GSE17972). It seems the data format is quite special one: qmap. Anyone have the script to transfer qmap bis-seq format To bigwig/wig/bedgraph.

GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17972 OR the question is same as: how to understand the qmap bis-seq format?

*Please ignore all the answer below. Since the reads were aligned to YH Genome, rather than GRCh37/38. I suggestion all the user who try to use this data should download the fastq and re-mappint to human genome, so that the result could be comparable with other dataset.*

* Download Finished, Alignment Finished, BW finished *

Okay. After all the guys' help. This problem has been totally solved.

1, this dataset is hg18 based.

2, qmap format has been listed by Devon Ryan..

3, in the qmap file, every base have been shown not only CpG, but every base.

4, therefore, when you calculate methylation level of CpG, you need extract the value for CpG sites.

Thanks.

enter image description here

qmap bigwig • 2.9k views
ADD COMMENT
1
Entering edit mode

ENA appears to have the fastq files in case you want to get the original data.

ADD REPLY
0
Entering edit mode

Yes. Thanks. I notice it: http://www.ebi.ac.uk/ena/data/view/PRJNA119949 However, I don't want to download the huge fastq data again. I think I can trust the mapping from BGI, right?

ADD REPLY
1
Entering edit mode

Long as you can understand the format, which you say is special :)

ADD REPLY
0
Entering edit mode

Okay. let change to "how to understand the qmap bis-seq format". Thanks genomax2! by the way, who is genomax1?

ADD REPLY
2
Entering edit mode
7.5 years ago

Convert each chromosome to a bedGraph file:

#!/usr/bin/env python
f = open("GSE17972_HUMtg5lib.qmap.XXXXXX.txt") # Change me!
line = f.next().strip()
chrom = line[1:]
of = open("{}.bedGraph".format(chrom), "w")
pos = 0
for line in f:
    cols = line.split()
    M = int(cols[1])
    UM = int(cols[2])
    if M + UM > 0:
        of.write("{}\t{}\t{}\t{}\n".format(chrom, pos, pos + 1, 100. * float(M) / (M + UM)))
    pos += 1
of.close()
f.close()

Concatenate the chromosomes and then convert to a bigWig if you need to.

BTW, either make a coverage track or filter this for sites with at least 5x or 10x coverage.

ADD COMMENT
0
Entering edit mode

Devon is awesome. I know, I know, I know you can solve all the problems. Thank you so much! @Devon Ryan

ADD REPLY
0
Entering edit mode

Hi Devon, Where can I find the detail information of qmap? I mean the meaning for each column? Thanks.

ADD REPLY
1
Entering edit mode

From the first post on this SeqAnswers thread (pasted here for reference).

"HumMethy.qmap.chr" files:

1)name:
    oligonucleotide name

2)Methy_Uniq:
    Tags mapped uniquely to the reference, suggesting this site to be methylated

3)Unmethy_Uniq:
    Tags mapped uniquely to the reference, suggesting this site to be unmethylated

4)No_methy_info_Uniq:
    Tags mapped uniquely to the reference, not giving any infomation about methylation on this site

5)Mismatch_Uniq:
    Tags mapped uniquely to the reference, mismatching on this site

6)Methy_Rep:
    Tags mapped repeatedly to the reference, suggesting this site to be methylated

7)Unmethy_Rep:
    Tags mapped repeatedly to the reference, suggesting this site to be unmethylated

8)No_methy_info_Rep:
    Tags mapped repeatedly to the reference, not giving any infomation about methylation on this site

9)Mismatch_Rep:
    Tags mapped repeatedly to the reference, mismatching on this site

10)Total_Rep:
    Sum of tags mapped repeatedly to the reference on this site

11)Rep_rate:
    Sum of tags mapped to the reference on this site(Column 2-5 & 10)/ Sum of tags used to the reference on this site(Column 2-9)

12)Qual_Methy_Uniq:
    Quality of column 2

13)Qual_Unmethy_Uniq:
    Quality of column 3
ADD REPLY
0
Entering edit mode

Nice, @genomax2, This question can be closed after you give us this reference! Thanks!

ADD REPLY
0
Entering edit mode

I found it in GEO link as the last supplemental file :)

ADD REPLY
0
Entering edit mode
7.5 years ago
Shicheng Guo ★ 9.4k

# qmap2bedgraph.pl

#!/usr/bin/perl
# USAGE: perl qmap2bedgraph.pl GSE17972_HUMtg5lib.qmap.chrY.txt.gz > GSE17972_HUMtg5lib.qmap.chrY.bedgraph
use strict;
my $f=shift @ARGV;
my $pos;
my $chr;
open F,$f;
if($f=~/(chr\w+)/){
$chr=$1;
}
while(<F>){
chomp;
my @line=split/\t/;
my $M=$line[1];
my $UM=$line[2];
if(($M+$UM)>=5){
my $mf=sprintf("%.3f",$M/($M+$UM));
my $start=$pos;
my $end=$pos+1;
print "$chr\t$start\t$end\t$mf\n";
}
$pos++;
}
ADD COMMENT

Login before adding your answer.

Traffic: 1567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6