hg19 features -> mm10
1
0
Entering edit mode
6.5 years ago

Hi guys,

I have been with several problems trying to figure out the way to get some files to run a specific program. The program is called Methy-pipe (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.010036) Its acctually very cool program that makes your bisulphite data analysis in one row :) Like he cut the adaptor, align to the reference, calculate the dmr's find the hyper and hypomethylated regions and made a pretty good visualization after that.

Its quite very complex program and i used last year with human data so my reference was hg19. Now, i need to run for mouse (mm10) but the program demands a specific files to compute the right way. I have been downloading most of them .. but i am stuck getting the TSS.win.bed file, frequency counts kmers whole genome and get the fasta files with genome in watson strand (+ strand) and crick strand (-)!

I already tried several methods, R, USCS table browser, Biomart and none seems working because i am trying first with hg19 genome (the program came with all this files done!) and im not getting the sames results that i have on the files.

e.g.

TSS.win.bed file from the program:

chr6 26919672 26919872 NR_026775:26924772:-5100 0 + chr6 26919872 26920072 NR_026775:26924772:-4900 0 + chr6 26920072 26920272 NR_026775:26924772:-4700 0 + chr6 26920272 26920472 NR_026775:26924772:-4500 0 + chr6 26920472 26920672 NR_026775:26924772:-4300 0 + chr6 26920672 26920872 NR_026775:26924772:-4100 0 + chr6 26920872 26921072 NR_026775:26924772:-3900 0 + chr6 26921072 26921272 NR_026775:26924772:-3700 0 + chr6 26921272 26921472 NR_026775:26924772:-3500 0 + chr6 26921472 26921672 NR_026775:26924772:-3300 0 + chr6 26921672 26921872 NR_026775:26924772:-3100 0 + chr6 26921872 26922072 NR_026775:26924772:-2900 0 + chr6 26922072 26922272 NR_026775:26924772:-2700 0 + chr6 26922272 26922472 NR_026775:26924772:-2500 0 + chr6 26922472 26922672 NR_026775:26924772:-2300 0 + chr6 26922672 26922872 NR_026775:26924772:-2100 0 + chr6 26922872 26923072 NR_026775:26924772:-1900 0 + chr6 26923072 26923272 NR_026775:26924772:-1700 0 + chr6 26923272 26923472 NR_026775:26924772:-1500 0 + chr6 26923472 26923672 NR_026775:26924772:-1300 0 + chr6 26923672 26923872 NR_026775:26924772:-1100 0 +

.... kmer frequency (the size is equal... but the frequencies in all cases variates... )

size 3095742485 A 844880379 C 585029306 G
585373256 T 846109263 AAA 70538173 AAC 41628587 AAG
57035135 AAT 71277349 ACA 53042554 ACC 33256687 ACG
7182538 ACT 46000071 AGA 57842499 AGC 39996114 AGG
50787391 AGT 46060123 ATA 53424099 ATC 38180411 ATG
52549180 ATT 71376587 CAA 54095945 CAC 39538055 CAG
57954925 CAT 52547486 CCA 52722892 CCC 29071626 CCG
7901554 CCT 50843149 CGA 6310082 CGC 6560356 CGG
7901582 CGT 7200192 CTA 36871692 CTC 44803901 CTG
57998015 CTT 57147487 GAA 56379286 GAC 27010029 GAG
44804773 GAT 38222786 GCA 41190369 GCC 34054354 GCG
6565608 GCT 40010084 GGA 44181981 GGC 34038352 GGG
29101353 GGT 33295405 GTA 32466936 GTC 27047217 GTG
39631571 GTT 41795532 TAA 59465824 TAC 32450157 TAG
36920632 TAT 53476338 TCA 56035616 TCC 44156555 TCG
6321970 TCT 57931470 TGA 56052796 TGC 41226134 TGG
52826775 TGT 53228668 TTA 59556531 TTC 56450564 TTG
54312483 TTT 70772691

my kmer counts with R program SomaticSignatures v2.8.4:

A C G T 844762494.4 584866863.9 585902389.7 845531786.4

AAA AAC AAG AAT ACA ACC ACG ACT 109888332.7 41811098 56867551.15 71223128.2 57569355.97 33308950.84 7185837.456 46005209.92 ....

I will be very glad if you help me. Please if you have any question, ask me :)

Thanks in advance.

Andreia

mm10 kmer watson.strand tsswindows • 1.9k views
ADD COMMENT
0
Entering edit mode
6.5 years ago

You could get the transcription start sites from Ensembl using its perl API. BioMart won't do it on a genome scale. You can also download the mouse genome from Ensembl's FTP site. As for the kmers file, it seems to me that it's a matter of reformatting to match the format expected by the program.

ADD COMMENT
0
Entering edit mode

Thanks for your quickly reply.

I will try the tss using ensembl perl API.

For the kmers is more than a format. The number dont match very well... i read we have several algoritms to calculate kmers i wounder if i am using the right one ...

Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 1411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6