How do I normalise calculated scores across different genetic ancestry groups?#
Download reference data#
The fastest method of getting started is to download a reference panel:
$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst
This example reference panel is based on 1000 Genomes (Nature 2015).
We also provide a reference panel that combines 1000 Genomes with data from the Human Genome Diversity Project derived from the gnomAD release (v3.1, Koenig, Yohannes et al. bioRxiv 2023), which includes additional samples and ancestry groups:
$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst
Note
These reference databases are not compatible with the test profile. The test profile is not biologically meaningful, and is only used to test the workflow installed.
Bootstrap reference data#
It’s possible to bootstrap (create from scratch) the reference data from the PLINK 2 data, which is how we create the reference panel tar. See How do I set up the reference database?
Genetic similarity analysis and score normalisation#
To enable genetic similarity analysis and score normalisation, just include the
--run_ancestry
parameter when running the workflow:
$ nextflow run pgscatalog/pgsc_calc -profile test,docker \
--run_ancestry path/to/reference/pgsc_HGDP+1kGP_v1.tar.zst
The --run_ancestry
parameter requires the path to the reference database.