Question: How To Map Chromosome Coordinates To Ncbi Refseq
1
gravatar for pld
6.5 years ago by
pld4.8k
United States
pld4.8k wrote:

I recently used BLAT to generate a large number of alignments but the input files were on the chromosome level, not individual genes. I realize BLAT and filtering the subject by taxonomical ID is an option, but BLAT seems to be a stricter option. The alignments are 60bp with no gaps. What would be the best way to find the gene associated with each of these regions, if it exists?

refseq genbank • 2.4k views
ADD COMMENTlink modified 6.5 years ago by Alex Reynolds29k • written 6.5 years ago by pld4.8k

If you have the coordinates of the alignment maybe Intersectbed could help.

ADD REPLYlink written 6.5 years ago by AndreiR240
2
gravatar for Duarte Molha
6.5 years ago by
Duarte Molha200
Oxford, UK
Duarte Molha200 wrote:

As @AndreiR as said...

Download a bed file of all the genes of interest (from the ucsc table browser you can download ready-made bed files)

then make a bed file from your alignments:

then install bedtools and:

intersectBed -a alignments.bed -b refseq.bed -wo > intersections.txt

This should give you all the info you need.

ADD COMMENTlink written 6.5 years ago by Duarte Molha200
0
gravatar for Alex Reynolds
6.5 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

One way to do this is to use BEDOPS psl2bed to convert UCSC BLAT output to UCSC BED, then use BEDOPS bedmap to list BED-formatted gene annotations associated with the BED coordinates of your BLAT search results.

For example, let's say you are working with human data and you want GENCODE v16 annotations. Here's how you would put them into sorted BED format with BEDOPS gtf2bed:

$ wget -O - ftp://ftp.sanger.ac.uk/pub/gencode/release_16/gencode.v16.annotation.gtf.gz \
    | gunzip -c - \
    | gtf2bed - \
    | grep -w gene - \
    > gencode.v16.genes.bed

Then, you could build a list of unique gene IDs that map to each BLAT search result with the following bedmap command. Let's assume your BLAT results are in a file called searchResults.psl:

$ psl2bed < searchResults.psl \
    | bedmap --echo --echo-map-id-uniq - gencode.v16.genes.bed \
    > answer.viaBedmap.bed

The file answer.viaBedmap.bed is a sorted BED file with results in the following layout:

[ BLAT result 1 ] | [ gene 1-1 ] ; [ gene 1-2 ] ; ... ; [ gene 1-i ]
...
[ BLAT result N ] | [ gene N-1 ] ; [ gene N-2 ] ; ... ; [ gene N-j ]

Each [ BLAT result n ] is a BED element from the BLAT result. The pipe character delimits this element from a semi-colon-delimited list of unique GENCODE gene IDs (gene_name fields from the original GTF) that map to the BLAT element.

If you want the entire gene element, replace --echo-map-id-uniq with --echo-map. There are other --echo-map-* operators available; the documentation explains their usage.

The default mapping criterion is one or more bases of overlap between the BLAT and GENCODE elements. You can change this criteria to be more stringent with the appropriate bedmap option.

If you don't need to know how the gene IDs associate with each BLAT result — you just want the gene IDs that overlap the BLAT result population — a second way to do this is to instead use BEDOPS bedops --element-of:

$ psl2bed < searchResults.psl \
    | bedops --element-of -1 gencode.v16.genes.bed - \
    > answer.viaBedops.bed

This will just return BED-formatted GENCODE gene elements that overlap the BLAT coordinates by one or more bases. You don't get the per-element association information with a bedops-based approach, but it will run faster than bedmap.

Which tool to use between bedmap and bedops will likely depend on what question you are trying to answer.

Also, the answer.*.bed files calculated with either approach are sorted, which means that they are ready to use with other downstream BEDOPS-based analyses. This means that you can do more fast and memory efficient set or mapping operations with this result and BEDOPS tools, and no further sorting is necessary.

ADD COMMENTlink modified 12 days ago by RamRS24k • written 6.5 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1897 users visited in the last hour