Processing

Birdseed Genotypes

Birdseed results are genotype calls produced by the Birdseed algorithm from the probe sets’ intensity values normalized by Invariant Set Median-Polish algorithm. Initially the normalized values of SNP probe sets from the normals samples were passed as input to birdseed along with the 6.0 priors file and special SNPs file. The clusters, confidences and calls files were generated. Birdseed was run again this time using the ‘–clusters’ option and using the SNP probe sets from all samples with the clusters file from the previous normals run.

References

Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, Lee C, Nizzari MM, Gabriel SB, Purcell S, Daly MJ, Altshuler D. “Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.” Nature Genetics (2008) 40(10):1253-60. [PMID: 18776909]

Invariant Set Median-Polish Values

Invariant Set Median-Polish results are probe sets’ normalized intensity values. Firstly, the probes’ raw intensity values were brightness corrected using Invariant Set Normalization as described in Li and Wong et al.’s dChip paper. Then the probe sets were summarized using a robust median, a median-polishing method described in Bolstad et al.’s RMA paper. Both of these steps were executed by a GenePattern module called SNPFileCreatorInvariant Set Median-Polish results are probe sets’ normalized intensity values. Firstly, the probes’ raw intensity values were brightness corrected using Invariant Set Normalization as described in Li and Wong et al.’s dChip paper. Then the probe sets were summarized using a robust median, a median-polishing method described in Bolstad et al.’s RMA paper. Both of these steps were executed by a GenePattern module called SNPFileCreator.

References

Lin, M, et al. “dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data.” Bioinformatics (2004) 20(8):1233-40.

Allele-Specific Copy-Numbers

Allele-specific copy numbers were estimated at each of the SNP markers by subtracting a background term and dividing by a scaling factor. The calculation is done in an allele-specific manner. The background term for each allele is estimated using the center of the birdseed cluster associated with homozygous call of the other allele (for example, for allele A we use the A coordinate of the center of the BB cluster). The scaling factor is set to half the of the distance between the AA cluster and the BB cluster along the relevant coordinate.

Gain/Loss-of-Heterozygosity

We compare the genotypes of each tumor to its matched normal. For SNPs which are heterozygous in the matched normal we flag the SNP in the tumor as R (retention) or L (LOH). For SNPs which are homozygous in the normal we flag the SNP in the tumor as U (uninformative) or C

Copy-Number

Raw copy numbers were estimated at each of the SNP and copy-number (CN) markers by subtracting a background term and dividing by a scaling factor. The total copy at SNP markers was calculated by summing the allele-specific values. For CN probes we built a model based on an X-dosage experiment which estimates the background and scaling factor as a function of the median intensity of the probe across normal samples. Finally, we divide the total copy number by the average of all normals and multiply by 2.

References

Zhao, X. et al. “An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays.” Cancer Res (2004) 64:3060-71. [PMID: 15126342]

Copy-Number Segmentation

CBS segmentation was used to segment the data after removal of outliers.

Leave a Reply