GWAS results: base pair annotation
1
0
Entering edit mode
2.6 years ago

Hello,

I have a question regarding the annotation of GWAS results.

Here are the sample files

File1 Below is File1 with the HG38id column name that contains chromosome number and base pair position.

HG38id
13:21325916
5:128327887
19:44906745
1:88820990
17:44981703

File2 Below is File2 with the HG38id column name that contains chromosome number and base pair position.

HG38id
13:21325516
5:128327100
19:44906111
1:88820200
17:44981903

I would like to know the ways to check the nearest base pair position between File1 and File2. For instance, 500 base pairs downstream and upstream.

The output for above given File1 and File2 with 500 base pair downstream and the upstream is

13:21325916
17:44981703 

because these two base pair positions are nearby 500 bases in both files.

What is the way to find the nearest base pair positions (for instance, 500 or 200 base pairs downstream and upstream) in large files?

Thanks in advance

Sincerely,
Hema

GWAS Annotation results • 794 views
ADD COMMENT
1
Entering edit mode
2.6 years ago

You could convert your files into bed files using awk

awk -F ":" ' {print $1 "\t" $2 "\t" $2}' your_file.txt >your_files.bed

Run that for both files to make two bed-files. Then you can use bedtools. There are various bedtools commands which are useful for your problem.

For example, bedtools closest will give you the closest interval in the first file for each interval in the second file, here's their example:

$ cat a.bed
chr1  10  20  a1  1 -

$ cat b.bed
chr1  7   8   b1  1 -
chr1  15  25  b2  2 +

$ bedtools closest -a a.bed -b b.bed
chr1  10  20  a1  1 - chr1  15  25  b2  2 +

For fine-grained control, you could also expand all your SNPs by a certain bp amount and then see whether they intersect using bedtools slop:

bedtools slop -i your_file.bed -b 5000  > your_expanded_regions.bed # expand the SNP feature to be 5000 bp in both directions
bedtools intersect -a your_SNPs.bed -b your_expanded_regions.bed # will report all SNPs in your_SNPs.bed that overlap with the new 10,000 bp windows in your_expanded_regions.bed
ADD COMMENT

Login before adding your answer.

Traffic: 3918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6