Question

mm9 RNA-seq reference genome

0

Entering edit mode

5.6 years ago

mikysyc2016 ▴ 120

Hi all

If I want to align my RNA-seq reads to mm9 cDNA reference. How I download this fasta file?

Thanks in advance,
Yachen

alignment sequencing RNA-seq gene • 2.9k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 5.6 years ago by mikysyc2016 ▴ 120

0

Entering edit mode

wget ftp://ftp.ensembl.org/pub/release-67/fasta/mus_musculus/dna/Mus_musculus.NCBIM37.67.dna_rm.toplevel.fa.gz

ADD REPLY • link 5.6 years ago by Kirill Tsyganov ▴ 370

0

Entering edit mode

Please explain what the file is, i.e., source, etc. Don't just paste a random link. Thanks.

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Keep in mind that alignments should be performed against a genome rather than the cDNA reference. You want to do standard RNA-seq alignments?

ADD REPLY • link 5.6 years ago by ATpoint 81k

0

Entering edit mode

I want to do standard RNA-seq analysis. FTP has DNA fasta and cDNA fasta, but they do not show mm9 or mm10. Depend on that I think I need cDNA reference for RNA-seq analysis. And I think genome reference is for DNA-seq analysis( like ChIP-seq...). Did I misunderstand? Thanks!

ADD REPLY • link 5.6 years ago by mikysyc2016 ▴ 120

0

Entering edit mode

RNA-seq:

TopHat / TopHat2 / HISAT / HISAT2 = reference genome
Kallisto / Salmon = reference cDNA transcriptome

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

score 2 · Answer 1 · 2018-09-13

2

Entering edit mode

5.6 years ago

Kevin Blighe 87k

Edit: as ATpoint mentions, you will require either a reference genome FASTA or reference cDNA FASTA depending on what you are planning to do.

You can obtain this from GENCODE: https://www.gencodegenes.org/mouse_releases/reference_releases.html

The latest releases for each build are shown on that page.

GRCm37 = mm9
GRCm38 = mm10

Other releases can be accessed via the drop-downs / tabs. You can download both GTF and corresponding FASTA files,

Kevin

ADD COMMENT • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you. Can I use them for standard RNA-seq analysis? What is the difference between genome FASTA( I think genome FASTA is for DNA-seq) and reference cDNA FASTA? WHich one I can use? And what is the difference if I download GTF or FASTA? Thanks in advance.

ADD REPLY • link 5.6 years ago by mikysyc2016 ▴ 120

0

Entering edit mode

Not always the case. Take a look at my post above. Text also here:

RNA-seq:

TopHat / TopHat2 / HISAT / HISAT2 = reference genome
Kallisto / Salmon = reference cDNA transcriptome

------------------------------------------------------

The GTF contains extra information about the transcripts that are stored in the cDNA FASTA; mainly, it contains the genomic co-ordinates of UTRs, exons, etc.

The cDNA FASTA contains the transcribed mRNA sequence. However, only the cDNA FASTA file can be used for alignment because it contains the actual sequence.

An example:

From cDNA FASTA:

grep -e "Brca1" -A5 gencode.vM18.transcripts.fa

>ENSMUST00000191198.1|ENSMUSG00000017146.12|OTTMUSG00000002870.3|OTTMUST00000119752.2|RP23-328K2.8-007|Brca1|531|protein_coding|
ACAGAGGGTCTCAAGCCCCCCTTGAGACACGCGCTTAACCTCAGTCAGGAGAAAGTAGAA
ATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAATACATTTCAAGTTTCAAAGCGT
CAGTCATTTGCTTTATTTTCAAAACCTAGAAGTCCCCAAAAGGACTGTGCTCACTCTGTG
CCCTCAAAGGAACTGAGTCCAAAGGTGACAGCTAAAGGTAAACAAAAAGAACGTCAGGGA
CAGGAAGAATTTGAAATCAGTCACGTACAAGCAGTTGCGGCCACAGTGGGCTTACCTGTG

From GTF:

grep -e "Brca1" gencode.vM18.annotation.gtf

chr11   HAVANA  transcript  101532083   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101551526   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 1; exon_id "ENSMUSE00001328968.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101539981   101540034   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 2; exon_id "ENSMUSE00001218040.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101535528   101535605   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  CDS 101535528   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  start_codon 101535596   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101533939   101534027   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 4; exon_id "ENSMUSE00000113052.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";


et cetera

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you! That make sense.I prepare to use kallisto to align the reads which I can run my personal computer. Can I use the link you provide(Are they reference genome or reference cDNA transcriptome)?

ADD REPLY • link 5.6 years ago by mikysyc2016 ▴ 120

0

Entering edit mode

At the link that I provided, you will find reference cDNA transcriptome FASTA files. You should use these for Kallisto.