Question: What's the best way to map CAGE-seq data to the genome?
0
gravatar for dustar1986
6.0 years ago by
dustar1986310
USA
dustar1986310 wrote:

Hi,

I've just downloaded several mouse CAGE-seq data from FANTOM5 database. 

I tried using bowtie2 default setting to map those rdna.fa files to mm10. And I found a quite low mapping rate around 40% with the majority reads hit more than 1 locus.

 

I'm totally new to CAGE-seq data. Please forgive me if I ask something silly.

 

1. When I looked into the files, the reads seemed being trimmed already. Is this mapping rate normal? 

2. Due to the short length of each read, it's reasonable to hit multiple genomic locations. But won't that raise false positive result when measuring which transcripts are 'really' expressed? 

3. Is there any specific parameter I should apply in Bowtie2?

 

bowtie cage-seq • 2.7k views
ADD COMMENTlink modified 5.2 years ago by Vandelnokk10 • written 6.0 years ago by dustar1986310

Hi! Could also be a stupid question .. but where did you manage to download the .bam files for the FANTOM5 data, I can only find bed files so far (also in teh CAGEr package). Thanks in advance.

ADD REPLYlink written 5.2 years ago by Vandelnokk10
1

I guess here: http://fantom.gsc.riken.jp/5/datafiles/latest/basic/ ?

ADD REPLYlink written 5.2 years ago by Vandelnokk10
2
gravatar for Floris Brenk
6.0 years ago by
Floris Brenk910
USA
Floris Brenk910 wrote:

Hi Dadi,

What you mean exactly with rDNA.fa files? rDNA normally stands for ribosomal DNA... I would recommend downloading the .bam files or just the ctss files.

  1. Normally CAGE reads are trimmed the same way as normal sequences reads beads on sequence quality (default = q=20) When aligning reads to the human genome we normally have at least 80% up to 95%... So I expect to see same ranges in mouse genome.
  2. Yes, by default these multimappers are thrown out. But calculations are made for this and it is only a very small percentage that is missed, because the reads are still at least 27nt long. And indeed there are some problems with pseudogenes for example, so normally these are thrown out of the analysis.
  3. We just use bwa default parameters and works pretty good.
ADD COMMENTlink modified 4 months ago by RamRS27k • written 6.0 years ago by Floris Brenk910

Thanks for your detailed explanation, Floris. That is extremely helpful for me. I think I really should download the bam file (mm9) and re-map them to mm10.

ADD REPLYlink written 6.0 years ago by dustar1986310
2
gravatar for dustar1986
6.0 years ago by
dustar1986310
USA
dustar1986310 wrote:

Worked out. Fantom5 uses its own aligner Delve: http://fantom.gsc.riken.jp/5/suppl/delve/delve.tgz

ADD COMMENTlink modified 4 months ago by RamRS27k • written 6.0 years ago by dustar1986310

I am trying to run delve as well, but always getting segmentation fault error. where you able to run delve?

ADD REPLYlink written 18 months ago by sinhashruti0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1328 users visited in the last hour