As there are many possible varieties of how you define "gene" (CDS, transcript, exons only)
there would be many different varieties of fasta files.
You can try ENSEMBL biomart with the following query to give you nucleotide sequence of protein coding regions with ensembl gene id as header id :...biomart link
You might have the wrong database. The NCBI Gene database doesn't seem to store any sequences, they are rather stored in the Nucleotide database. Depending on what you want to do, Ensembl may also be more useful.
What you are looking for is actually the refseq datasets.
All the entrez gene data is directly derived from the refseq database.
So, depend if you are looking for genomic, CDs or protein data, you should look for the relevant refseq data.
You can go to the relevant database interface, for example "protein" and limit the query to the relevant refseq data. Then download all in fasta format from the upper left menu.
I don't want to use ensembl. NCBI define starts and ends on contigs/chromosomes in the gene database, I want to find out a way to retrieve the corresponding sequences, that's all.
How about clicking on "FASTA"/"GenBank" in the "Genomic regions, transcripts, and products" then? ;-)
We are not enough in the team to do it ~42,000 times.
Then please update your question.
Ok, overread the "all".
well, you will find out that the coordinates will be based on the ensembl annotation. I showed you a way, if you don't want use it, it's your problem.
Thank all for your help. Anyway they associate each gene symbol with coordinates (which definitions can be arguable, I'm ok with that, but it looks like some kind of refseq mrna clustering + human curation and that's exactly what I'm looking for). So as they define this set of coordinates I don't understand why they don't just make a fasta file like any other dataset. I managed to code a script that parse the annotation field of the summary file and that download the sequences with eUtils. Thanks again for your help.