How to use scoring files in the PGS Catalog#

The easiest way to calculate a polygenic score is to use a scoring file that’s been published in the PGS Catalog!

1. Samplesheet setup#

First, you need to describe the structure of your genomic data in a standardised way. To do this, set up a spreadsheet that looks like:

Example samplesheet for a combined plink2 file set#

sampleset

path_prefix

chrom

format

cineca

/path/to/target_genomes/cineca_synthetic_subset

pfile

Save the file as samplesheet.csv. See How to set up a samplesheet for more details.

2. Pick scores from the PGS Catalog#

Accessions#

Individual scores can be used by using Polygenic Score IDs that start with with the prefix “PGS”. For example, PGS001229. The parameter --accession accepts polygenic score IDs:

--pgs_id PGS001229

Multiple scores can be set by using a comma separated list:

--pgs_id PGS001229,PGS000802

Traits#

If you would like to calculate every polygenic score in the Catalog for a trait, like coronary artery disease, then you can use the --trait_efo parameter:

--trait_efo EFO_0001645

Multiple traits can be set by using a comma separated list.

Publications#

If you would like to calculate every polygenic score associated with a publication in the PGS Catalog, you can use the --pgp_id parameter:

--pgp_id PGP000001

Multiple traits can be set by using a comma separated list.

Note

PGS, trait, and publication IDs can be combined to calculate multiple polygenic scores.

3. Calculate!#

$ nextflow run pgscatalog/pgscalc \
    -profile <docker/singularity/conda> \
    --input samplesheet.csv \
    --pgs_id PGS001229 \
    --trait_efo EFO_0001645 \
    --pgp_id PGP000001

Note

For more details about calculating multiple scores, see How to apply multiple scores in parallel