Question: mm9 RNA-seq reference genome
0
gravatar for mikysyc2016
7 months ago by
mikysyc201630
mikysyc201630 wrote:

Hi all

If I want to align my RNA-seq reads to mm9 cDNA reference. How I download this fasta file?

Thanks in advance, Yachen

ADD COMMENTlink modified 7 months ago by Kevin Blighe41k • written 7 months ago by mikysyc201630
wget ftp://ftp.ensembl.org/pub/release-67/fasta/mus_musculus/dna/Mus_musculus.NCBIM37.67.dna_rm.toplevel.fa.gz
ADD REPLYlink modified 7 months ago • written 7 months ago by Kirill260

Please explain what the file is, i.e., source, etc. Don't just paste a random link. Thanks.

ADD REPLYlink written 7 months ago by Kevin Blighe41k

Keep in mind that alignments should be performed against a genome rather than the cDNA reference. You want to do standard RNA-seq alignments?

ADD REPLYlink written 7 months ago by ATpoint15k

I want to do standard RNA-seq analysis. FTP has DNA fasta and cDNA fasta, but they do not show mm9 or mm10. Depend on that I think I need cDNA reference for RNA-seq analysis. And I think genome reference is for DNA-seq analysis( like ChIP-seq...). Did I misunderstand? Thanks!

ADD REPLYlink written 7 months ago by mikysyc201630

RNA-seq:

  • TopHat / TopHat2 / HISAT / HISAT2 = reference genome
  • Kallisto / Salmon = reference cDNA transcriptome
ADD REPLYlink written 7 months ago by Kevin Blighe41k
2
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

Edit: as ATpoint mentions, you will require either a reference genome FASTA or reference cDNA FASTA depending on what you are planning to do.

You can obtain this from GENCODE: https://www.gencodegenes.org/mouse_releases/reference_releases.html

The latest releases for each build are shown on that page.

  • GRCm37 = mm9
  • GRCm38 = mm10

Other releases can be accessed via the drop-downs / tabs. You can download both GTF and corresponding FASTA files,

Kevin

ADD COMMENTlink modified 7 months ago • written 7 months ago by Kevin Blighe41k

Thank you. Can I use them for standard RNA-seq analysis? What is the difference between genome FASTA( I think genome FASTA is for DNA-seq) and reference cDNA FASTA? WHich one I can use? And what is the difference if I download GTF or FASTA? Thanks in advance.

ADD REPLYlink written 7 months ago by mikysyc201630

Not always the case. Take a look at my post above. Text also here:

RNA-seq:

  • TopHat / TopHat2 / HISAT / HISAT2 = reference genome
  • Kallisto / Salmon = reference cDNA transcriptome

------------------------------------------------------

The GTF contains extra information about the transcripts that are stored in the cDNA FASTA; mainly, it contains the genomic co-ordinates of UTRs, exons, etc.

The cDNA FASTA contains the transcribed mRNA sequence. However, only the cDNA FASTA file can be used for alignment because it contains the actual sequence.

An example:

From cDNA FASTA:

grep -e "Brca1" -A5 gencode.vM18.transcripts.fa

>ENSMUST00000191198.1|ENSMUSG00000017146.12|OTTMUSG00000002870.3|OTTMUST00000119752.2|RP23-328K2.8-007|Brca1|531|protein_coding|
ACAGAGGGTCTCAAGCCCCCCTTGAGACACGCGCTTAACCTCAGTCAGGAGAAAGTAGAA
ATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAATACATTTCAAGTTTCAAAGCGT
CAGTCATTTGCTTTATTTTCAAAACCTAGAAGTCCCCAAAAGGACTGTGCTCACTCTGTG
CCCTCAAAGGAACTGAGTCCAAAGGTGACAGCTAAAGGTAAACAAAAAGAACGTCAGGGA
CAGGAAGAATTTGAAATCAGTCACGTACAAGCAGTTGCGGCCACAGTGGGCTTACCTGTG

From GTF:

grep -e "Brca1" gencode.vM18.annotation.gtf

chr11   HAVANA  transcript  101532083   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101551526   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 1; exon_id "ENSMUSE00001328968.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101539981   101540034   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 2; exon_id "ENSMUSE00001218040.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101535528   101535605   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  CDS 101535528   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  start_codon 101535596   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101533939   101534027   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 4; exon_id "ENSMUSE00000113052.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";


et cetera
ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe41k

Thank you! That make sense.I prepare to use kallisto to align the reads which I can run my personal computer. Can I use the link you provide(Are they reference genome or reference cDNA transcriptome)?

ADD REPLYlink written 7 months ago by mikysyc201630

At the link that I provided, you will find reference cDNA transcriptome FASTA files. You should use these for Kallisto.

ADD REPLYlink written 7 months ago by Kevin Blighe41k
1

I see. Thanks a lot!

ADD REPLYlink written 7 months ago by mikysyc201630
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2185 users visited in the last hour