Study (initial methods released Feb 2007)

Sample inclusion and exclusion criteria:

Cases: Cases were selected according to ADA (2003) definitions of type 2 diabetes (fasting plasma glucose > 7.0 mM or 2 hour postload glucose during an OGTT >11.1 mM). To avoid confounding with type 1 diabetes, GAD Ab positive patients were excluded. To exclude MODY diabetes, subjects from families with mutations in known MODY diabetes genes and diabetic individuals with onset age < 35 years were excluded.

Controls: Unrelated population controls and sibling controls with normal glucose tolerance (NGT) based on ADA (2003) definitions (fasting plasma glucose <6.1mM and 2 hour postload glucose during an OGTT <7.8 mM) at last clinical visit were selected. Unrelated controls had no first degree family history of type 2 diabetes.

Sample matching

Unrelated Case/control: Groups of one case matched to one control, one case matched to two controls or two cases matched to one control were included in the study. Within each group, cases were matched to controls by gender, age (control visit age no less than 5 years from age of onset of diabetes of matched case), collection locale (one of 9 centers throughout Finland and Sweden) and BMI (within 5 points). Birth-year of cases and controls within each group differed by no more than 20 years.

Discordant Siblings: A maximum of two case sibs and two control sibs from each sib-ship were included in this portion of the study. Diabetics with the youngest onset age were matched to sibling controls who were normal glucose tolerant at age < 5 years from age of onset of the oldest diabetic sibling. In families with more than two eligible control siblings based on our inclusion criteria, we selected the two most normal glucose tolerant siblings who matched the gender of diabetic siblings. In cross-gender matched sib-ships, an unaffected female sibling was selected only if she was normal glucose tolerant at age of onset of her diabetic brother (to account for generally later age of onset of diabetes in females).

Final sample


Clinical characteristics


Note: Subjects are in the process of being called back to the clinic for follow up measurements. In controls, another OGTT has been performed to confirm that controls have remained normal glucose tolerant. In cases, data on progression to diabetic complications and mortality data are being collected.

Quantitative traits

Samples have been carefully phenotyped and were analyzed for correlation of genotype to phenotype.

Quantitative traits analyzed include:
i) Glucose traits: glucose and insulin measures during an OGTT for controls,
ii) Obesity traits: BMI, weight, height, waist circumference and waist-hip ratio
iii) Lipid traits: HDL cholesterol, LDL cholesterol, triglycerides, ApoA1, ApoA2 and ApoB)
iv) Blood pressure: diastolic blood pressure, systolic blood pressure and hypertension

Description of Quantitative Traits


Sample Preparation Process


  1. Sample arrival:
    A total amount of 1ug of genomic DNA (diluted in 1X TE buffer and at 50ng/ul) was shipped from Sweden for the whole genome scan. Upon arrival, all sample tubes were inventoried and plated into 96 well master plates for genotyping.
  2. Master plate creation:
    The master plate was created using automated liquid handling robotics to equally interleaf cases and controls on the plates, thereby, enhancing technical uniformity during the laboratory process. These plates were barcoded, scanned into a password-protected database and stored at -80°C.
  3. Long-term sample storage:
    Whole genome amplification using the Qiagen REPLI-gTM kit was performed on all the samples using 100ng of input genomic DNA. Yielding a total of 50ug, the DNA stock was split into to two aliquots for storage and future use following the whole genome scan.
  4. Affymetrix QC Filters:
    Two levels of DNA quality control metrics were assessed on the genomic samples to ensure sample quality, quantity and identity before proceeding with the whole genome scan.

    4a. Filter 1: Picogreen: To determine the quantity of double stranded DNA in the sample stocks.

    4b. Filter 2: Genotyping: Using the Sequenom MassARRAY(r) genotyping technology, two sets of SNPs were genotyped on all the samples. The first, being a set of polymorphic markers that have been previously genotyped on these DNAs. The accuracy of these genotypes for each DNA ensured sample identity from historical data. The second round of markers provided a genotypic fingerprint for each sample. Each of these 24 SNP sets are on both of the Affymetrix Human Mapping 500K GeneChips(r), and serve as a cross-platform sample verification during the laboratory process.

Genotyping methods

Genotyping for the whole genome scan was performed using the Affymetrix Human Mapping 500K GeneChip®.

Analytical Methods

1. Summary of QC metrics applied to the data:


a. Fingerprinting
Genotype data used in the analysis was based on, for every individual sample, scans from an Nsp and Sty fractions concordant with a Sequenom fingerprint profile as well as cross-chip concordant based on the 50 SNPs present on both Nsp and Sty fractions (Affymetrix QC SNPs). Each pair of Nsp-Sty scans for a given individual’s genotype data was selected from the pool of all scans for that individual (including samples which were redone in the lab) which had the highest overall genotyping call rate based on the BRLMM calling method (cite) and the highest concordance rates with the fingerprint data.

b. Individual exclusion criteria
All individuals passing the fingerprint quality checks were further screened based on genotyping performance. Any individual with genotyping call rates less than 95% for either Nsp or Sty fractions based on BRLMM were excluded. In addition, individuals where the gender called from X chromosome genotype data was discrepant with the gender obtained from medical records were excluded from the analysis.

c. Preliminary genome analysis
In order to verify the existing known familial relationships in the data as well as to detect additional 1st degree cryptic relationships not apparent from the known pedigree data, a preliminary IBD-sharing analysis was performed using the PLINK analysis software package, using markers with less than 5% missing data and greater than 1% minor allele frequency (MAF) on all individuals passing fingerprint and performance criteria. From the pairs of samples identified to as cryptic first degree relatives (parent-offspring, twins, or siblings), we excluded additional samples which maximized the overall global sample size of the data. Also, discordant sibling pairs found to be related less than siblings were broken apart and re-matched to existing data. After this genome analysis, a final data set of screened individuals was constructed to perform the actual type-2 diabetes scan.

d. Marker exclusion criteria
Markers which were advanced to the type-2 diabetes scan stage passed the following quality control criteria:

1. Did not map to multiple locations in the genome 3,605 markers excluded
2. Less than 5% missing data for the marker and a minor allele frequency greater than 1% ~101,000 markers excluded
3. Passed a Hardy Weinberg Test with a p < 1e-06 in Controls (~5,700 markers exclud

The total number of markers that passed all filters was 389,869.

e. Multimarker analysis
An additional 169,482 specified multi-marker tests were defined based on HapMap genotype data as per de Bakker et al and Pe’er et al (CITE NG PAPERS). These tests were performed on all data, bringing the total number of markers tested for association to 559,351.

2. Summary of association analyses

Association analyses were performed using PLINK. Cochran-Mantel-Haenszel (CMH) stratified test for association was performed on each group using the fine-matching criteria for the samples described above. Specifically, the CMH tests an association between SNP and phenotype, conditional on predefined clusters of samples. Our predefined clusters represent fine-matching criteria based on (a) kinship (discordant sibships vs. unrelated case/controls), (b) BMI, (c) Age, (d) Sex, and (d) collection locale. Samples that fail to generate an appropriate match were pooled together into a single cluster (and the presented results are robust to the presence or absence of this orphan cluster, data not shown). Because the study design included discordant sibling clusters, the usual chi-squared distribution and associated raw p-values are not appropriately calibrated.

Significance was assessed via permutation in which case and control labels were swapped within each cluster. In addition for each SNP, Hardy-Weinberg tests for cases, controls, and the pooled sample, tests for differential missing data between cases and controls were also performed.

3. Quantitative Trait Analysis

For each quantitative trait, a multivariate linear or logistic regression analysis with or without covariates was performed using as many of the 3082 samples we had available (which included individuals related as 1st degree relatives). For some traits, individuals with diabetes were analyzed separately from individuals without diabetes. See results for related metabolic traits ( for details of the phenotype specific analysis. To correct for inflation caused by inclusion of related individuals, genomic control p values are reported.

For each trait, we tabulated the number of SNPs that exceeded the specified p value after genomic control. We noted that 1) for a bulk of the distribution for all traits (p-values between 0.5 – 0.1), the distribution of p values are consistent with expectations, and 2) for many traits, notably those related to circulating cholesterol levels, we observed a slight excess of p-values in the tail (p < 0.01) relative to the null. To quantify the significance of this excess, we constructed 1000 data sets where the quantitative trait phenotype was randomized (either within a sibship in the case of relatives, or across unrelated samples), tested for association at all SNPs, and calculated the number of SNPs in the permuted data set that exceeded the given p-value threshold after genomic control. Assuming that counts for a given p value threshold were distributed normally, we standardized the resulting distribution of counts and calculated Z-scores for the observed count compared to the permuted mean and standard deviation of counts for each trait, and highlighted in bold those bins where the count is higher than expected by chance (p < 0.05). For simplicity, we present the mean number counts for APOA1 after permutation, though we note that these expected values are stable across different QTL phenotype values (data not shown).