Question: mm9 RNA-seq reference genome
0
gravatar for mikysyc2016
16 months ago by
mikysyc201670
mikysyc201670 wrote:

Hi all

If I want to align my RNA-seq reads to mm9 cDNA reference. How I download this fasta file?

Thanks in advance, Yachen

ADD COMMENTlink modified 16 months ago by Kevin Blighe53k • written 16 months ago by mikysyc201670
wget ftp://ftp.ensembl.org/pub/release-67/fasta/mus_musculus/dna/Mus_musculus.NCBIM37.67.dna_rm.toplevel.fa.gz
ADD REPLYlink modified 16 months ago • written 16 months ago by Kirill270

Please explain what the file is, i.e., source, etc. Don't just paste a random link. Thanks.

ADD REPLYlink written 16 months ago by Kevin Blighe53k

Keep in mind that alignments should be performed against a genome rather than the cDNA reference. You want to do standard RNA-seq alignments?

ADD REPLYlink written 16 months ago by ATpoint28k

I want to do standard RNA-seq analysis. FTP has DNA fasta and cDNA fasta, but they do not show mm9 or mm10. Depend on that I think I need cDNA reference for RNA-seq analysis. And I think genome reference is for DNA-seq analysis( like ChIP-seq...). Did I misunderstand? Thanks!

ADD REPLYlink written 16 months ago by mikysyc201670

RNA-seq:

  • TopHat / TopHat2 / HISAT / HISAT2 = reference genome
  • Kallisto / Salmon = reference cDNA transcriptome
ADD REPLYlink written 16 months ago by Kevin Blighe53k
2
gravatar for Kevin Blighe
16 months ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

Edit: as ATpoint mentions, you will require either a reference genome FASTA or reference cDNA FASTA depending on what you are planning to do.

You can obtain this from GENCODE: https://www.gencodegenes.org/mouse_releases/reference_releases.html

The latest releases for each build are shown on that page.

  • GRCm37 = mm9
  • GRCm38 = mm10

Other releases can be accessed via the drop-downs / tabs. You can download both GTF and corresponding FASTA files,

Kevin

ADD COMMENTlink modified 16 months ago • written 16 months ago by Kevin Blighe53k

Thank you. Can I use them for standard RNA-seq analysis? What is the difference between genome FASTA( I think genome FASTA is for DNA-seq) and reference cDNA FASTA? WHich one I can use? And what is the difference if I download GTF or FASTA? Thanks in advance.

ADD REPLYlink written 16 months ago by mikysyc201670

Not always the case. Take a look at my post above. Text also here:

RNA-seq:

  • TopHat / TopHat2 / HISAT / HISAT2 = reference genome
  • Kallisto / Salmon = reference cDNA transcriptome

------------------------------------------------------

The GTF contains extra information about the transcripts that are stored in the cDNA FASTA; mainly, it contains the genomic co-ordinates of UTRs, exons, etc.

The cDNA FASTA contains the transcribed mRNA sequence. However, only the cDNA FASTA file can be used for alignment because it contains the actual sequence.

An example:

From cDNA FASTA:

grep -e "Brca1" -A5 gencode.vM18.transcripts.fa

>ENSMUST00000191198.1|ENSMUSG00000017146.12|OTTMUSG00000002870.3|OTTMUST00000119752.2|RP23-328K2.8-007|Brca1|531|protein_coding|
ACAGAGGGTCTCAAGCCCCCCTTGAGACACGCGCTTAACCTCAGTCAGGAGAAAGTAGAA
ATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAATACATTTCAAGTTTCAAAGCGT
CAGTCATTTGCTTTATTTTCAAAACCTAGAAGTCCCCAAAAGGACTGTGCTCACTCTGTG
CCCTCAAAGGAACTGAGTCCAAAGGTGACAGCTAAAGGTAAACAAAAAGAACGTCAGGGA
CAGGAAGAATTTGAAATCAGTCACGTACAAGCAGTTGCGGCCACAGTGGGCTTACCTGTG

From GTF:

grep -e "Brca1" gencode.vM18.annotation.gtf

chr11   HAVANA  transcript  101532083   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101551526   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 1; exon_id "ENSMUSE00001328968.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101539981   101540034   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 2; exon_id "ENSMUSE00001218040.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101535528   101535605   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  CDS 101535528   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  start_codon 101535596   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101533939   101534027   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 4; exon_id "ENSMUSE00000113052.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";


et cetera
ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe53k

Thank you! That make sense.I prepare to use kallisto to align the reads which I can run my personal computer. Can I use the link you provide(Are they reference genome or reference cDNA transcriptome)?

ADD REPLYlink written 16 months ago by mikysyc201670

At the link that I provided, you will find reference cDNA transcriptome FASTA files. You should use these for Kallisto.

ADD REPLYlink written 16 months ago by Kevin Blighe53k
1

I see. Thanks a lot!

ADD REPLYlink written 16 months ago by mikysyc201670
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1434 users visited in the last hour