Quick question: is there out there a .bed (or similar) files that span over all coding regions in hg38 coordinates? I need to analyze some WGS samples
Thank you very much in advance for any help!
Quick question: is there out there a .bed (or similar) files that span over all coding regions in hg38 coordinates? I need to analyze some WGS samples
Thank you very much in advance for any help!
wget -O - "http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gtf.gz" |\
gunzip -c | grep 'transcript_type "protein_coding"' |\
awk '($3=="exon") {printf("%s\t%s\t%s\n",$1,int($4)-1,$5);}' |\
sort -T . -t $'\t' -k1,1 -k2,2n | bedtools merge
?
For the future bioinformaticians who land on this page: Please note that the release 35 in no more the latest release. So you may want to update the link to the gtf.gz
file in the script above. You can find out the latest release by inspecting: http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/
. For example, the current release (as of May 2022) is release 40
, thus the link is: http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.annotation.gtf.gz
.
By the way, the coordinates are on the human reference genome: hg38
(i.e.GRCh38
).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You could download the hg38 GTF file from GENCODE and extract relevant columns and records from it.