Loading a Dataset

Loading a Dataset

Data can be loaded in six formats. Ped and Haps files can also load an optional marker info file and PLINK files normally require an accompanying map or binary map file. Further options are presented on the load screen:

  • Haploview saves time by only computing pairwise LD statistics for markers within a certain distance of each other. The default is 500KB. Enter a value of zero to force all pairwise computations.
  • Haploview excludes individuals with less than 50% complete genotypes. This threshold can be adjusted in the load dialog. Additional details about excluded individuals are available from the marker check tab.
  • When loading a file dumped from the HapMap project website, it is possible to automatically display SNP and gene tracks from the HapMap above the data by checking the "Download and show HapMap info track" box. More information is available with the LD Display help. [hapmap file only]
  • If you wish to perform association tests, you must inform the program now and select either family trios or case/controls. For family datasets a standard TDT or paren TDT are available. More details are available under association. [pedfile only]
  • If your data is from the X chromosome in the linkage formats, tick the box so that Haploview will correctly process your data. In other formats, select the X chromosome in the dropdown menu. X chromosome data is not supported by the phased haplotype format. All functionality now works with the X chromosome.
  • Haploview will maximize the information available from a pedigree for both LD analyses and association tests. For the former it creates a maximal set of unrelated individuals, using trio data only for obligate parent/offspring phasing. For TDT association testing, all available transmissions from parent-offspring will be used. More detailed information about specific situations is available in the FAQ.
  • Haploview can be configured to support proxy host settings using the "Proxy Settings" button on the load screen.

 

Haploview allocates 512MB of memory by default. This is usually sufficient to handle datasets with several thousand markers. If you are running the program on very large datasets (>20,000 markers) you may need to force more memory (presuming your computer has sufficient resources available). This can be accomplished using the following command:

java -jar Haploview.jar -memory 2000

Where "2000" in this case specifies 2000 megabytes of memory and can be adjusted as necessary. Previous versions of Haploview required a slightly different command to adjust available memory, which still works:

java -Xmx2000M -cp Haploview.jar edu/mit/wi/haploview/Haploview