Question: all coding regions .bed file hg38 Whole Genome Sequencing
0
gravatar for cocchi.e89
12 weeks ago by
cocchi.e89110
cocchi.e89110 wrote:

Quick question: is there out there a .bed (or similar) files that span over all coding regions in hg38 coordinates? I need to analyze some WGS samples

Thank you very much in advance for any help!

bed coding wgs • 299 views
ADD COMMENTlink modified 12 weeks ago by Pierre Lindenbaum131k • written 12 weeks ago by cocchi.e89110

You could download the hg38 GTF file from GENCODE and extract relevant columns and records from it.

ADD REPLYlink written 12 weeks ago by _r_am31k
2
gravatar for Pierre Lindenbaum
12 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:
wget -O - "http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gtf.gz" |\
gunzip -c | grep 'transcript_type "protein_coding"' |\
awk '($3=="exon") {printf("%s\t%s\t%s\n",$1,int($4)-1,$5);}' |\
sort -T . -k1,1 -k2,2n | bedtools merge

?

ADD COMMENTlink written 12 weeks ago by Pierre Lindenbaum131k

thanks so much, I was wondering, what are the regions that are filtered out between line 2 and line 3? (regions that are flagged as protein_coding but not exons?

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by cocchi.e89110

You should download and take a look at the GTF file. Explore it so you understand what the above command does with real data.

ADD REPLYlink written 12 weeks ago by _r_am31k

I did and de-piped command, mine was more a theoretical question

ADD REPLYlink written 12 weeks ago by cocchi.e89110
1

Right, then what are the unique values you see in $3 when transcript_type is protein_coding? That should tell you what non-exonic regions in protein coding transcripts are.

ADD REPLYlink written 12 weeks ago by _r_am31k

Don't the exon entries contain non-coding UTRs?

ADD REPLYlink written 12 weeks ago by igor11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1902 users visited in the last hour