Question: Ensembl gene annotation GRCh37.87
0
gravatar for F
6 weeks ago by
F3.4k
Iran
F3.4k wrote:

Hi,

I need to quantify gene expression by salmon so I need Ensembl gene annotation GRCh37.87 likely in fasta

I tried ftp://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/ but not working

Do you know know from where I can downlioad such file for salmon?

Thank you

rna-seq salmon ensembl • 186 views
ADD COMMENTlink modified 6 weeks ago by genomax67k • written 6 weeks ago by F3.4k
2

Ensembl gene annotation for GRCh37.87 would be in GTF or GFF3 not FASTA.

edit: ftp://ftp.ensembl.org/pub/grch37/release-87/

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by jean.elbers1.1k

Sorry but in Salmon manual says in fasta

https://salmon.readthedocs.io/en/latest/salmon.html

ADD REPLYlink written 6 weeks ago by F3.4k

If Salmon needs the FASTA sequences for the transcripts, then you can do the follow these steps (http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread_ex) using the GRCh37 reference from GENCODE and getting the GTF file from possibly from ftp://ftp.ensembl.org/pub/grch37/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz. This assumes your BAM files are alignments to the main assembly and not including the alternative haplotypes or patches.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by jean.elbers1.1k

I have bam files I have alignment come from GRCh37_g1k by STAR; Now I need to quantify raw counts; I have done that by featurecounts but I have a lot of strange features so I decided to used Salmon. Thank you anyway but I don't know why I can not open these links

So not I though to convert my bam to fastq

When I used this command in Salmon I obtained this error

salmon quant -t gencode.v30lift37.transcripts.fa -l A -a file.bam -o salmon_quant

If you have access to the genome FASTA and GTF used for alignment
consider generating a transcriptome fasta using a command like:
gffread -w output.fa -g genome.fa genome.gtf
you can find the gffread utility at (http://ccb.jhu.edu/software/stringtie/gff.shtml)

Finally I used gffread but I am getting this error

[fi1d18@cyan01 fi1d18]$ /temp/hgig/fi1d18/gffread-0.11.2.Linux_x86_64/gffread -w transcripts.fa -g hs37d5.fa gencode.v30lift37.annotation.gff3

Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).
[fi1d18@cyan01 fi1d18]$

[fi1d18@cyan01 fi1d18]$ /temp/hgig/fi1d18/gffread-0.11.2.Linux_x86_64/gffread -w transcripts.fa -g hs37d5.fa gencode.v30lift37.annotation.gtf

Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).
[fi1d18@cyan01 fi1d18]$
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by F3.4k
1

have you tried wget ftp://ftp.ensembl.org/pub/grch37/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz ?

Perhaps your chromosomes are in the format >1 for chr1 instead of the gencode >chr1 format?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by jean.elbers1.1k
2

I just tried ftp://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.cdna.all.fa.gz, it worked for me. This is the file that I think you need for Salmon.

ADD REPLYlink written 6 weeks ago by jean.elbers1.1k
3
gravatar for genomax
6 weeks ago by
genomax67k
United States
genomax67k wrote:

Use GRCh37 fasta from GENCODE. If you need annotations then those are available there as well.

That said for new analyses you should stick with current release unless you are trying to reproduce some past analyses.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by genomax67k

Thank you, bam files are from GRCh37 so I need that I guess

ADD REPLYlink written 6 weeks ago by F3.4k

Sorry @genomax

Where I can find the same genome.fasta and GTF files from GRCh37_1k (1000 genome) ?

I googled, for genome.fasta there were 3 phases but I did not find any corresponding GTF

ADD REPLYlink written 6 weeks ago by F3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1610 users visited in the last hour