Question: the latest human reference genome fasta file
4
gravatar for hana
4.4 years ago by
hana170
Malaysia
hana170 wrote:

Hi all
 

I would like to download the latest human reference genome (GRCH38) in fasta and gtf format for my RNA seq analysis. I would like to know which database is the beast,Genbank version 21 or ensemble?

where can I get the fasta file of whole genome of ensemble version?  

Is the below link below contains this file?

 ftp://ftp.ensembl.org/pub/release-77/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

Is there any difference between Genebank and Ensembl's  alignment and annotation output result ?

 

thanks in advance 

rna-seq • 16k views
ADD COMMENTlink modified 4.4 years ago by Emily_Ensembl17k • written 4.4 years ago by hana170
1

You can download it from UCSC database: http://hgdownload.cse.ucsc.edu/downloads.html#human

ADD REPLYlink written 4.4 years ago by iraun3.5k

 

Hi all
 

I would like to download the latest human reference genome (GRCH38) in fasta and gtf format for my RNA seq analysis. I would like to know which database is the beast,Genbank version 21 or ensemble?

where can I get the fasta file of whole genome of Ensembl version?  

Is the below link below contains this file?

 ftp://ftp.ensembl.org/pub/release-77/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

Is there any difference between Genbank and Ensembl's alignment and annotation output result ?

 

thanks in advance 

ADD REPLYlink modified 4.4 years ago by Emily_Ensembl17k • written 4.4 years ago by hana170
0
gravatar for Manvendra Singh
4.4 years ago by
Manvendra Singh2.0k
Berlin, Germany
Manvendra Singh2.0k wrote:

database is the beast?????

Yes, Its the one from ensembl.

 

You can download it from here, same way as you previously downloaded hg19 from UCSC

whole genome fasta  

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

GTFs from

http://genome.ucsc.edu/cgi-bin/hgTables?

 

ADD COMMENTlink written 4.4 years ago by Manvendra Singh2.0k

Which file is contained the whole genome file? 

 hg38.2bit ?    

thank you

ADD REPLYlink written 4.4 years ago by hana170
2

Just directly download the fasta file. There's no need to deal with 2bit.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k
1

Exactly, hana was asking about .2bit so I wrote that he can convert them as well.

ADD REPLYlink written 4.4 years ago by Manvendra Singh2.0k

Sorry @Devon, from where I could find fasta file for each individual chromosome for hs37d5.fa ?

ADD REPLYlink written 8 hours ago by F3.4k
1

https://www.gencodegenes.org/human/release_5.html assuming you are referring to release 5. All releases can be found on this page at GENCODE.

ADD REPLYlink modified 8 hours ago • written 8 hours ago by genomax64k
1

 
 Yes , you need to convert it to fasta

You can get the utility program TwoBitToFa from here:

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

Once you downloaded it, you must change permissions first to allow it to be executed as a program.

Then you execute it from a terminal:

without arguments to see the options:

$ /path/to/twoBitToFa

twoBitToFa - Convert all or part of .2bit file to fasta
usage:
twoBitToFa input.2bit output.fa
ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Manvendra Singh2.0k

thank you very much

ADD REPLYlink written 4.4 years ago by hana170

Hi

I have already download the fasta and gtf files of hg38 from USCS database and run tophat.

But I have a problem with running cuffllinks . I got the below error

cufflinks  --GTF   genome.gtf   -o   /home/ra/cufflinks    /home/ra/accepted_hits.bam

[20:55:32] Loading reference annotation.
Error parsing strand (1) from GFF line:
uc001aaa.3    chr1    +    11873    14409    11873    11873    3    11873,12612,13220,    12227,12721,14409,        uc001aaa.3

Can you please tell me what dose it mean and how can I solve it

thank you

 

 

ADD REPLYlink written 4.4 years ago by hana170

That's not a GTF file. You have to explicitly set the output format to GTF, otherwise you'll get all of the table columns as is.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1407 users visited in the last hour