So I'm attempting to get some information for hg19 exons:
I need chr, start, end, strand, and gene name
I was able to do this via UCSC table browser -> refseq -> refGene -> selected field from primary and related tables:
I choose the following fields:
chrom, strand, exonStarts, exonEnds, name2
and this gives me exactly what I need except that it gives me multiple exonStarts and exonEnds on the same row and thus I'm not able to run it in typical programs I use (bedtools etc).
I know that it's possible to separate out these start sites and end sites into separate rows using something like awk, but after spending a bit of time (Obtaining Exon Lengths:) trying to figure it out, I can't seem to do it.
Was hoping somebody could tell me what to do to separate out these multiple exon starts and ends into different rows and remove duplicate start and end sites (for instance the first two rows have similar exon start and end sites).