I have been working with TCGA cancer data to examine expression (RNAseqV2) and methylation (Illumina 450k) data. I want to look at sequencing data, but I'm a bit lost to what sort of information is available through TCGA. I want to examine whether there are nonsense mutations between positions 2000-2500 across all cancer types available in TCGA. What sort of resources/workflow should I expect?
If you are asking whether you can search for non-sense mutations between amino acids 2000-2500 of a particular gene in the TCGA sets, I would suggest using the Mutation Annotation Files (MAFs). These files have already been run through somatic variation calling so you won't have to deal with the sequencing data directly. You can filter out your gene of interest across all the MAF's in perl,awk,grep etc... Then in the amino_acid_change column you can search for integer values between p.2000-p.2500, and trv_type "Non-sense mutation".
Hope this helps.
For your specific question, your best bet is to query cBioPortal using its Onco Query Language (OQL). For example, the following query gets you all nonsense mutations seen in PTEN across all cancer types:
PTEN: MUT = NONSENSE
We also use the more general term
TRUNC that pools together ORF truncating events
If you want to do anything more complicated, then I agree with Nick that you need to get the MAF files. Here's an intro to MAF files: Working with MAF files (Mutation Annotation Format) from the TCGA (The Cancer Genome Atlas)