3 months ago
FL512

Dear all,

I am new to work on WGS/WES analysis. I have VCF files obtained from WGS already and I would like to focus on SNPs located on non-coding regions such as promoter, enhancer, and 5'/3'-UTRs, or if possible, all non-coding regions I would like to analyze.

1. Is there any database, reference sequence (txt, bed whatever), or even script available to extract SNPs of non-coding region from my VCF files?
2. If anyone has already posted and figured it out, please kindly let me know where the post is.

I do not want to analyze by using biased approach like focusing of the 5k -20kb upstream of genes of interest. Rather, I want to do a global analysis, therefore, I am struggling...

Extract the non-UTR exons in BED format, use bedtools complement to get the complement of the entire genome and the exons, that is the non-coding part. Intersect that file with your VCF, these are your non-coding variants, bedtools intersect is probably what you want.

Adding to this answer you can get the non-coding region part from your reference GTF file and then you can use bedtools intersect to get your region.

Thank you for letting me know. By the way, how am I able to extract non-coding regions from my reference file, which I was struggling for a couple of days. I thought I had to download the non-UTR exons from UCSC genome browser. Anyway, thank you for your help!

Hello, GTF/GFF file has a column called 'type' from which you filter the biotype you need. gff file specification under mRNA you will find the 'intron' attributes. I hope this helps.

UPDATE: for someone like me, this is how I downloaded the region of interests. Bed File With Introns Only

Thank you very much. That was also I was thinking of but I did not know how to do it because of the lacking of knowledge and experience. I will keep you updated. Thank you again.