Question: 1000 genomes coding SNPs not in exonic regions?
0
gravatar for spiral01
8 months ago by
spiral01100
spiral01100 wrote:

Hi, I am attempting to identify the exon that each of my synonymous or missense SNPs in the 1000 genomes data belongs to. I am using the GENCODE GTF files found here: https://www.gencodegenes.org/human/ and extracting all exons.

I then use bedtools to identify which exon each of my SNPs fall in. It appears that many of my SNPs' co-ordinates are not within any exon. What I would like to know is if and how synonymous or missense SNPs can fall in intronic regions?

snp • 291 views
ADD COMMENTlink written 8 months ago by spiral01100
1

Why are you comparing to the GTF? There are tools designed to do exactly what you need.

ADD REPLYlink written 8 months ago by Emily_Ensembl19k

I need to obtain the exon that each SNP lies in, as well as the start and end co-ordinates (because my ultimate goal is to identify the length of the specific exon that each SNP lies in). The available GENCODE annotation of 1000 genomes variants provides the exon number within the gene, but not the exon id or start and end coordinates?

ADD REPLYlink written 8 months ago by spiral01100
1

Simply get the gencode annotation for hg19, extract exons, and use bedtools intersect where -a is the SNPs and -b is the exon.gtf. Use option -wb to return the entire interval of the matching exon. From there you can cut or awk out what you need.

ADD REPLYlink written 8 months ago by ATpoint24k

Can you confirm that the reference genomes are the same, so hg19 vs hg19 or hg38 vs hg38?

ADD REPLYlink written 8 months ago by ATpoint24k

Hi, thanks for the response. Yes I can confirm that the ref genomes are the same, hg38.

ADD REPLYlink written 8 months ago by spiral01100

Where are you getting your 1K genome SNPs from?

ADD REPLYlink modified 8 months ago • written 8 months ago by i.sudbery5.9k

I am getting the data with GENCODE annotations here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/functional_annotation/

ADD REPLYlink written 8 months ago by spiral01100

Could these be artifacts from a liftOver operation, perhaps?

ADD REPLYlink written 8 months ago by RamRS24k
1
gravatar for i.sudbery
8 months ago by
i.sudbery5.9k
Sheffield, UK
i.sudbery5.9k wrote:

How are you filtering the synonymous and non-synonymous SNPs from all 1000KG SNPs?

ADD COMMENTlink written 8 months ago by i.sudbery5.9k

The 1000 genomes data is available with consequence annotations here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/functional_annotation/. I then simply parse the variants for those that have a missense or synonymous consequence annotation.

ADD REPLYlink written 8 months ago by spiral01100
4

Those files are all on GRCh37. That's why it's not matching the GRCh38 GTF.

ADD REPLYlink written 8 months ago by Emily_Ensembl19k

Gah such a rookie error. Thank you!

ADD REPLYlink written 8 months ago by spiral01100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1766 users visited in the last hour