HGDP-CEPH Human Genome Diversity Cell Line Panel

March 2020, HGDP-CEPH panel distributed by the CEPH Biobank

has been sequenced by the Wellcome Sanger Institute:

Global human genomes reveal rich genetic diversity shaped by complex evolutionary history.

The results were published in Science on March 18, 2020

Introduction to the HGDP-CEPH Panel

A resource of 1063 lymphoblastoid cell lines (LCLs), from 1050 individuals in 52 world populations and corresponding milligram quantities of DNA is banked at the Foundation Jean Dausset-CEPH in Paris. These LCLs were collected from various laboratories by the Human Genome Diversity Project (HGDP) and CEPH in order to provide unlimited supplies of DNA and RNA for studies of sequence diversity and history of modern human populations. Information for each LCL is limited to sex of the individual, population and geographic origin.
The table provides details of all the LCLs in the resource, uncorrected for duplicates (13 duplicate pairs), 2 genomically atypical samples, 13 pairs of duplicate LCLs and those from two genetically atypical individuals, and 96 pairs of close relatives (first and/or second degree relative pairs) LCLs for ("close") relative pairs. Sixteen LCLs differ in gender indicated on records and that determined by molecular typing. All samples used for this resource were collected with proper informed consent.

The DNAs have been distributed to more than 120 investigators for genotyping and/or resequencing; the results are contributed to a central database. To date, the DNAs have been typed genome wide with almost 1 million SNPs, 843 microsatellites, and 51 small indel loci. Some 10,000 CNV calls from two different laboratories are included in the database. Nuclear and mitochondrial DNA regions have been resequenced. The whole genome sequencing of 929 HGDP-CEPH individuals was published in 2020.

For more information contact the HGDP Manager.

Genotype Submission

Panel users who want to submit marker genotypes and related information should first contact the HGDP Manager for a password. It is then possible to submit electronic files directly or by arrangement with the DB manager. Data submissions should provide identifying, genetic and genomic information for markers and sequences e.g. official (HUGO/NCBI) nomenclature, genbank identifier, dbSNP identifier (rs or ss number), local name, type of marker (SNP, short indel, STRP etc.), the allelic nucleotides for SNP loci and indels (as A,T,G,C,-) and allelic repeat sequences for STRPs, ancestral alleles, genetic map position defined by chromosome number and genome sequence coordinates (current NCBI build number required), 100bp of sequence flanking the allelic nucleotides, genotyping technique(s) and, of course, the actual genotypes for each HGDP-CEPH individual (HGDP identifier) as allelic nucleotides or numeric code (1 and 2 for SNPs and 1, 2, etc. for short indels), and number of nucleotides in the allelic repeat or numeric code for STRP markers, with correspondance between numeric alleles and nucleotide alleles indicated in the file. For CNVs provide the DGVa or dbVAR study ID (nstd number), variant ID (nsv number) and supporting variant ID (nssv number) for each call (see www.ebi.ac.uk/dgva or www.ncbi.nlm.nih.gov/dbvar, respectively). Also, for each CNV, give start and stop coordinates in the genome sequence and the typing method used.

Sequence Submission

Sanger sequencing: Send sequences in FASTA or GenBank format with phred scores or other evaluations of base call quality, and sequencing error rates.

Short reads (Illumina, Solid, 454...):
We now recommend submission of short read sequences from HGDP-CEPH samples to sequence read archives (SRA) at NCBI www.ncbi.nlm.nih.gov, EBI www.ebi.ac.uk, or DDBJ trace.ddbj.nig.ac.jp. When submitting, you will be asked for a STUDY TITLE, and an ANONYMIZED NAME for each sample sequenced. Please include in the study title the name of the resource, "HGDP-CEPH Human Genome Diversity Panel". A study title might read, "whole genome resequencing 10x of HGDP-CEPH Human Genome Diversity Panel samples". The anonymized name for each sample sequenced should be the HGDP-CEPH identifier, e.g. HGDP00989. The sample database link for HGDP-CEPH identifiers is www.cephb.fr/common/HGDPid_populations.xls. For each sample submission XML file, please add a sample link to the HGDP-CEPH database in xml format.
For the NCBI SRA:
<SAMPLE_LINKS>
    <SAMPLE_LINK>https://www.cephb.fr/common/HGDPid_populations.xls
    </SAMPLE_LINK>
</SAMPLE_LINKS>

For the EBI SRA:
<SAMPLE_ATTRIBUTE>
    <TAG>HGDP-CEPH Database Link</TAG>
    <    <VALUE>https://www.cephb.fr/common/HGDPid_populations.xls</VALUE>
</SAMPLE_ATTRIBUTE>

Use of the Panel-related study title, sample, anonymized name and sample link will permit us to track all SRA short read submissions for HGDP-CEPH samples, and include a list of them in a dedicated section in the panel database.

Access Policy to the HGDP-CEPH

The main goal of the Panel is to allow further research in human population genetics. A resource of 1063 lymphoblastoid cell lines (LCLs) from 1050 individuals in 51 world populations is presently banked at the Foundation Jean Dausset (CEPH). DNAs have been produced from these LCLs and organized into a panel at CEPH that is available for distribution to qualified, non-commercial, academic research laboratories on a collaborative basis.

Panel H952 contains no pairs of relatives closer than first cousins with a few possible exceptions, no duplicate pairs and no atypical samples.

Researchers who request the panel DNAs must commit to type at least all DNAs of H952 with each marker used (at least 50 common markers), and to communicate the results to CEPH, no later than 6 months after completion of typing the DNAs or than time of publication (please mention Fondation Jean Dausset - CEPH in the acknowledgements), for inclusion in a central database (www.cephb.fr/hgdp-cephdb), available to diversity panel users as well as to the public. If these two conditions cannot be met please provide a scientific justification. Collaborators must agree to use the DNA samples for academic research only and not to transfer DNA samples to other laboratories without permission from HGDP and CEPH. We will need your agreement to each of these conditions (original or modified as above) which should be specifically mentioned in writing to HGDP Manager before we can send the DNAs.

Some laboratories may wish to use the HGDP-CEPH panel for resequencing projects. We recognize that the requirements posed for resequencing all 1050 DNAs may be prohibitive for such undertakings, given the present technical limits. We encourage these laboratories to contact us and propose the requirements for the work that they wish to undertake.

We would appreciate learning about the research for which you propose to use the diversity panel DNAs. One or two paragraphs will be sufficient. Genetic markers to be used should be described or preferably listed if practical; use official nomenclature and give their genome positions. For a resequencing study, indicate the genome region(s), the size of each region and how many individuals and corresponding populations to be sequenced.

In general, panel DNAs, dissolved in TE (10:1), will be sent in 96 well microtiter plates, at a concentration of ~60ng/µl. The quantity of DNA to be shipped will be ~5.0 micrograms per well. If you require more than 5.0 Âµg of each panel DNA, please contact us with the details of what you need. As we do not have a specific budget to support managing the LCLs, DNA production, quality control and formatting, we charge for DNAs that you receive on a cost price recovery basis.

The CEPH can also provide RNA extracted from lymphoblastoid cell lines. If you have interest in expression studies on the HGDP-CEPH panel please send an e-mail to HGDP Manager.

Cell lines are not distributed.

The LCLs of the HGDP-CEPH panel were produced by different laboratories in various countries over the past 20-30 years. In this project, DNA from these cell lines will be used by many laboratories in different countries. The HGDP-CEPH collaboration has determined that all of the blood samples used to produce these LCLs were collected with appropriate informed consent for the time and place of their collection. Recipients of DNA from these LCLs are, of course, responsible for ensuring that its use complies with legal standards that govern their laboratories.

Researchers who wish to participate in the project as outlined above should use the following procedure: (specifically indicating their agreement with the terms of collaboration as above).
After approval a purchase order should be sent by e-mail to the BRC Manager giving the following information :

PO date and number
billing address
intra-european VAT number if applicable
panel ID (H1063 or H952) or requested DNA IDs (e.g. : HGDP01340) in a excel file
requested quantity in µg (5µg multiples only)
international courier account number

You will be informed by e-mail of the DNAs estimated delivery time. All your feedbacks on the quality of samples would be appreciated. Thanks to mention the CEPH Biobank in the acknowledgements of your publications using the HGDP-CEPH panel.

INTRODUCTION

LABORATOIRE D'EXCELLENCE

COLLABORATIONS

NEWS

RESOURCES

LABORATORIES