The general TCGA data primer can be found here and should be considered an all-inclusive reference.
The TCGA sample id follows the pattern as above. The first four letters are either “TCGA” or the four letter code for the cancer type (i.e., “LUSC” for lung squamous cell). The next field following the dash is a unique two character code for the tissue selection site (TSS) in which the sample came from. The next field following that would be a unique four character identification for a specific sample from that TSS. The code as assembled thus far should be uniquely identifiable as for the geographical source (clinic) and suspected type of cancer sample as determined by pathology from a unique patient.
The rest of the assembled identifying fields following the code would be for subsets of the aforementioned sample. For instance, if multiple samples were taken from a single cancer and placed in multiple vials or if subsequent aliquots were derived from these biopsies and eventually processed in unique plates at specific centers.
- Clinical Data: clinical data derived from patient charts from a physician at the TSS or derived from pathology.
- Copy Number CGH: regions of statically significant copy number change across samples from the CGH platform.
- Copy Number SNP: regions of statistically significant copy number change across samples from SNP platform.
- LOH SNP: statistically significant LOH from all samples using SNP platform.
- SNP: unique combination of SNPs associated with the sample from SNP platform.
- Methylation: statistically significant methylated genes across samples.
- Expression Exon: statistically significant exons present across samples.
- Expression Gene: statistically significant genes expressed across samples.
- Expression miRNA: statistically significant miRNA expression across samples.
- Mutation: significant mutations across samples usually from sequencing platform (whole genome or exome).
Specific implications for level of each data type can be found at the TCGA Wiki. Please note that specific permissions must be acquired and granted for access to lower level (i.e., level 1 and 2) data. Level 3 and 4 data are freely available from the publicly accessible links elsewhere on this site and/or the dbGAP and SRA archives. None of these are directly downloadable from the Broad but from a third party centralized storage source.
|Data Level||Level Type||Description|
|1||Raw||Low-level data for single sample
|2||Processed||Normalized single sample data
Interpreted for presence or absence of specific molecular abnormalities
|Aggregate of processed data from single sample
Grouped by probed loci to form larger contiguous regions (in some cases)
Regions of Interest (ROI)
|Quantified association across classes of samples
Associations based on two or more