Sample inclusion and exclusion criteria:
Cases: Cases were selected according to ADA (2003) definitions of type 2 diabetes (fasting plasma glucose > 7.0 mM or 2 hour postload glucose during an OGTT >11.1 mM). To avoid confounding with type 1 diabetes, GAD Ab positive patients were excluded. To exclude MODY diabetes, subjects from families with mutations in known MODY diabetes genes and diabetic individuals with onset age < 35 years were excluded.
Controls: Unrelated population controls and sibling controls with normal glucose tolerance (NGT) based on ADA (2003) definitions (fasting plasma glucose <6.1mM and 2 hour postload glucose during an OGTT <7.8 mM) at last clinical visit were selected. Unrelated controls had no first degree family history of type 2 diabetes.
Unrelated Case/control: Groups of one case matched to one control, one case matched to two controls or two cases matched to one control were included in the study. Within each group, cases were matched to controls by gender, age (control visit age no less than 5 years from age of onset of diabetes of matched case), collection locale (one of 9 centers throughout Finland and Sweden) and BMI (within 5 points). Birth-year of cases and controls within each group differed by no more than 20 years.
Discordant Siblings: A maximum of two case sibs and two control sibs from each sib-ship were included in this portion of the study. Diabetics with the youngest onset age were matched to sibling controls who were normal glucose tolerant at age < 5 years from age of onset of the oldest diabetic sibling. In families with more than two eligible control siblings based on our inclusion criteria, we selected the two most normal glucose tolerant siblings who matched the gender of diabetic siblings. In cross-gender matched sib-ships, an unaffected female sibling was selected only if she was normal glucose tolerant at age of onset of her diabetic brother (to account for generally later age of onset of diabetes in females).
Note: Subjects are in the process of being called back to the clinic for follow up measurements. In controls, another OGTT has been performed to confirm that controls have remained normal glucose tolerant. In cases, data on progression to diabetic complications and mortality data are being collected.
Samples have been carefully phenotyped and were analyzed for correlation of genotype to phenotype.
Quantitative traits analyzed include:
i) Glucose traits: glucose and insulin measures during an OGTT for controls.
ii) Obesity traits: BMI, weight, height, waist circumference and waist-hip ratio.
iii) Lipid traits: HDL cholesterol, LDL cholesterol, triglycerides, ApoA1, ApoA2 and ApoB.
iv) Blood pressure: diastolic blood pressure, systolic blood pressure and hypertension.
Description of Quantitative Traits (updated March 2007)
Sample Preparation Process
A total amount of 1ug of genomic DNA (diluted in 1X TE buffer and at 50ng/ul) was shipped from Sweden for the whole genome scan. Upon arrival, all sample tubes were inventoried and plated into 96 well master plates for genotyping.
Master plate creation:
The master plate was created using automated liquid handling robotics to equally interleaf cases and controls on the plates, thereby, enhancing technical uniformity during the laboratory process. These plates were barcoded, scanned into a password-protected database and stored at -80°C.
Long-term sample storage:
Whole genome amplification using the Qiagen REPLI-gTM kit was performed on all the samples using 100ng of input genomic DNA. Yielding a total of 50ug, the DNA stock was split into to two aliquots for storage and future use following the whole genome scan.
- Affymetrix QC Filters:
Two levels of DNA quality control metrics were assessed on the genomic samples to ensure sample quality, quantity and identity before proceeding with the whole genome scan.
4a. Filter 1: Picogreen: To determine the quantity of double stranded DNA in the sample stocks.
4b. Filter 2: Genotyping: Using the Sequenom MassARRAY(r) genotyping technology, two sets of SNPs were genotyped on all the samples. The first, being a set of polymorphic markers that have been previously genotyped on these DNAs. The accuracy of these genotypes for each DNA ensured sample identity from historical data. The second round of markers provided a genotypic fingerprint for each sample. Each of these 24 SNP sets are on both of the Affymetrix Human Mapping 500K GeneChips(r), and serve as a cross-platform sample verification during the laboratory process.
Genotyping for the whole genome scan was performed using the Affymetrix Human Mapping 500K GeneChip®.
Analytical Methods (updated March 2007)
1. Summary of QC metrics applied to the data:
a. DNA fingerprinting concordance.
Genotype data used in the analysis was based on, for every individual sample, scans from one Nsp and one Sty fraction concordant with a Sequenom fingerprint profile as well as cross-chip concordance based on the 50 SNPs present on both Nsp and Sty fractions (Affymetrix QC SNPs). Each pair of Nsp-Sty scans for a given individual’s genotype data was selected from the pool of all scans for that individual (including samples which were redone in the lab) which had the highest overall genotyping call rate based on the Bayesian Robust Linear Model with Mahalanobis distance classifier genotype calling algorithm on the Affymetrix platform (BRLMM) calling method(15, 16) and the highest concordance rates with the fingerprint data.
b. Individual exclusion criteria.
All individuals passing the fingerprint quality checks were further screened based on genotyping performance. Any individual with genotyping call rates less than 95% for either Nsp or Sty fractions based on BRLMM(15, 16) were excluded. In addition, individuals whose gender call from X chromosome genotype data was discrepant with the gender obtained from medical records were excluded from the analysis. In order to verify the existing known familial relationships in the data as well as to detect additional first degree cryptic relationships not apparent from the known pedigree data, a preliminary identity-by-descent (IBD) analysis was performed using the PLINK analysis software package (17) (http://pngu.mgh.harvard.edu/purcell/plink/). We excluded individuals from pairs of samples identified as cryptic first degree relatives (parent-offspring, twins, or siblings concordant for phenotype) to conform to our study design. After this analysis, a final data set of 2,931 individuals remained in which we performed the actual T2D association analyses
c. Marker exclusion criteria
Markers which were advanced to the type-2 diabetes scan stage passed the following quality control criteria:
1. Did not map to multiple locations in the genome
3,605 markers excluded
2. Less than 5% missing data for the marker in the overall sample
34,532 markers excluded
3. Less than 10% missing data per marker in both the population-based and familial sample
229 markers excluded
4. Had a minor allele frequency greater than 1% in the overall sample
66,787 markers excluded
5. Had a minor allele frequency greater than 1% in both population-based and familial samples
2,909 markers excluded
3. Passed a Hardy Weinberg Test with a P value greater than 1 x 10-6 in controls
5,775 markers excluded
The total number of markers analyzed was 386,731. In addition for each SNP, Hardy-Weinberg tests for cases and the pooled sample, and tests for differential missing data between cases and controls were performed (18) and this information was used to ensure that each positive control SNP and each SNP evaluated for replication had high technical quality.
d. Multimarker analysis
In addition to the single marker tests, we also performed tests based on multi-marker haplotypes, which increases genome coverage and therefore power of our scan to detect associations (21). Based on the 386,731 SNPs, we developed 284,968 additional two-marker tests, each of which serves as proxy (r2>0.8) for one or more SNPs not well captured by any single marker through pairwise LD (22). Based on analysis of HapMap Phase II data in the CEU sample, we estimate that these 671,699 statistical tests query with nearly complete correlation (r2>0.8) a total of 1,745,389 SNPs (78% of common SNPs in HapMap CEU), and a larger number (2,040,388 SNPs, 91% of common SNPs in HapMap CEU) with a very strong but incomplete correlation (r2>0.5).
2. Summary of association analyses
a. T2D analysis. To test for association to T2D, we constructed a meta-analysis which combined association tests from the population-based sample with the discordant sibships. To perform association testing in the population sample, we performed a Cochran-Mantel-Haenszel (CMH) stratified test based on the fine-matching criteria based on (a) kinship (discordant sibships vs. unrelated case/controls), (b) BMI, (c) Age, (d) Sex, and (d) collection locale (23). The CMH tests an association between SNP and phenotype, conditional on predefined clusters of samples, or strata. Samples that failed to generate an appropriate match were pooled together into a single cluster. The results presented were are robust to the presence or absence of this “orphan cluster” (data not shown). To estimate how well the distribution was calibrated, we estimated a genomic inflation factor based on the median chi-squared test in the matched population-based case/control sample. We found the statistic to be close to null in the unrelated sample (λ = ~1.05). To perform association testing in the familial sample, we used the DFAM procedure in PLINK (17). Briefly, within each sibship, the test is based on the observed number of risk alleles in affected individuals. The expectation and variance of the number of risk alleles in affected offspring under the null hypothesis of no association are given by the multivariate hypergeometric probability distribution (sampling genotypes without replacement) based on all individuals in the sibship. This test is similar to a standard Cochran-Mantel-Haenszel test (23), conditioning on sibship as strata, and is still an allelic one degree of freedom test, except the genotype-based multivariate hypergeometric distribution is used to account for the fact that not all allelic combinations are possible within a sibship (e.g. an individual cannot have two paternal alleles). Based on these analyses, nominal P-values for each subset of data were converted to Z-scores based on the magnitude of significance and the direction of effect (based on the odds ratio estimated for each subset of data). X-chromosome markers were analyzed only for the population-based sample, as the discordant sibships were not matched for gender. Under the null hypothesis of no association, the meta-analysis test statistic is distributed as a standard normal. A threshold of P = 5x10-8 for genome-wide significance was estimated based on nominal association corrected for all 1,000,000 independent common SNPs in the genome (24).
b. Analysis of 18 related traits. 17 quantitative diabetes related traits and one dichotomous hypertension trait were selected for analysis based on heritability estimates in the Botnia study(25). Each phenotype (fasting glucose, insulinogenic index, HOMA-IR, BMI, waist circumference, waist-hip ratio, height, weight, TGHDL ratio, HDL-cholesterol, LDL-cholesterol, TG levels, apolipoproteins A1, A2 and B, systolic and diastolic blood pressure, and HTN) was adjusted for significant covariates (age, gender, geographic origin, BMI, diabetes status and treatment if applicable). For glycemic traits (fasting glucose, insulinogenic index, HOMA-IR, and TG-HDL ratio as a measure of insulin resistance), only individuals without diabetes were analyzed. For traits that were strongly correlated with diabetes status (BMI, waist circumference and waist-hip ratio) individuals with diabetes were analyzed separately from individuals without diabetes, and Z-scores were normalized by population based correction. For each trait, a multivariable linear or logistic regression analysis with or without covariates was performed on an adjusted residual or Z-score using as many of the 3082 samples for whom we had phenotype data available (including any individuals related as siblings; n for each trait listed here). To correct for inflation caused by inclusion of related individuals, the genomic control inflation factor based on the median test statistic was estimated, and P-values based on the test statistic adjusted by this factor are reported. Inflation factors ranged from 1.00 (TG-HDL ratio) to 1.13 (systolic blood pressure).
c. Distributions to evaluate P-value excess. For each trait, we tabulated the number of SNPs that exceeded the specified P-value for T2D and after genomic control for quantitative traits. We noted that (1) for a bulk of the distribution for all traits (P-values between 0.5 – 0.1), the distribution of P-values are consistent with expectations under the null, and 2) for many traits, notably those related to circulating cholesterol levels, we observed a slight excess of P-values in the tail (P < 0.01) relative to the null. To quantify the significance of this excess, we constructed 1000 data sets where the T2D or quantitative trait phenotype was randomized (either within a sibship in the case of relatives, or within matched clusters for T2D and across unrelated samples for other traits), tested for association at all SNPs, and calculated the number of SNPs in the permuted data set that exceeded the given P-value threshold. Assuming that counts for a given P-value threshold were distributed normally, we standardized the resulting distribution of counts and calculated Z-scores for the observed count compared to the permuted mean and standard deviation of counts for each trait, and highlighted in bold those bins where the count is higher than expected by chance (P < 0.05). For simplicity, we present the mean counts for apolipoprotein A-I association after permutation, though we note that these expected values are stable across different quantitative trait association analyses (data not shown).