About samples of RNA-seq in Ensembl database for primate and etc
1
0
Entering edit mode
4.2 years ago
Yu.K • 0

Hi every one,

Thank you for reading. I would like to know information of the samples of RNA-seq in Ensembl.

At present, I want to analyze RNA-seq data from primate (e.g. Chimpanzee) to new world monkey (e.g. Marmoset). And I want to use the following datasets used by Ensembl.

ftp://ftp.ensembl.org/pub/release-10...ro_3.0/rnaseq/

ftp://ftp.ensembl.org/pub/release-10...5486v1/rnaseq/

However, I did not get the information of samples including Lab and experimental condition. I also did not find the depository for original sequence data like FASTQ.

Please tell me about the file or web page showing information of these samples in Ensembl.

RNA-Seq Ensembl primate • 1.3k views
ADD COMMENT
0
Entering edit mode

Link for chimpanzee data as an example.

It is difficult to figure out where this data is from since there is no usable information in README file. I think you should email Ensembl help desk and ask them.

ADD REPLY
0
Entering edit mode

Thank you for your reply and I am sorry my slow response. After I saw your message, I inquired of help desk about my question. And then I received the message like following.

ADD REPLY
1
Entering edit mode
4.2 years ago
Ben Moore ★ 2.4k

Hi Yu K,

I've pasted my answer from the Ensembl Helpdesk for others to see:

The RNA-seq data that is used in the Ensembl gene annotation process, and is available through the links you pointed towards on the Ensembl FTP site were downloaded from the European Nucleotide Archive (ENA): https://www.ebi.ac.uk/ena/browser/home

For example: https://www.ebi.ac.uk/ena/browser/text-search?query=chimpanzee%20pituitary

In ENA, the FASTQ file available for a Run is linked to a 'Study accession', 'Sample accession' and 'Experiment accession' which describes the experimental conditions and data source.

More information about the source of the RNA-seq data available for each species from the FTP site, and used in the gene annotation process can be found by clicking on 'More Information and Statistics from the species homepage: [1] https://www.ensembl.org/Pan_troglodytes/Info/Annotation [2] https://www.ensembl.org/info/genome/genebuild/Primate_clade_gene_annotation.pdf

ADD COMMENT
0
Entering edit mode

@Ben: Please make this information available in the RNAseq folder and/or README file on Ensembl FTP site, with a note that this data was used for annotation. It does not seem to be appropriate for doing any cross species comparisons/DE analysis.

ADD REPLY
0
Entering edit mode

Hi,

The .bam files available through the Ensembl FTP site are not always equivalent to fastq files. You can convert them, but you can’t find them in ENA as one file or id. This is due to a couple of reasons:

  • One of them is if there are way too many reads in one sample, we don’t keep all the reads.
  • We merge different sample ids under the same sample name, for example liver sample might come from sample_id1 + sample_id2 + sample_id3... of different projects. So, the .bam files are not ideal for gene differential analysis due to at least miscalculation of RPKM or origin of biological samples (might be from different labs, different animals, different time period of each animal).

We use those data in order to find and annotate new genes and we don’t perform DE analysis and would not necessarily recommend doing so. These files are good to have an idea of the expression of the gene or visually check of the RNAseq data that was used to annotate a particular gene. We have plans to update the README based on your feedback.

However, if you wish to access exactly the same data, you can find SRR ids with the following command from the bam files: samtools view ASSEMBLY_ID.ENA.merged.1.bam | awk -F'.' '{print $1}’ | sort | uniq > SRR_used_.txt.

And then from ENA to find those samples.

ADD REPLY
0
Entering edit mode

Thank you for your answer and I am sorry my slow response. I understood the resorces for RNA-seq associated data on Ensembl FTP. But I did not find the information of some tissue samples like Bonobo. If possible, please tell me about it.

Thank you for your help.

ADD REPLY
0
Entering edit mode

After the above quetion was posted, I could find the appropriate data in ENA by searching for 'pan paniscus'.

ADD REPLY
0
Entering edit mode

Please accept (green check mark) @Ben's response to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1345 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6