How to use scoring files in the PGS Catalog#
The easiest way to calculate a polygenic score is to use a scoring file that’s been published in the PGS Catalog!
1. Samplesheet setup#
First, you need to describe the structure of your genomic data in a standardised way. To do this, set up a spreadsheet that looks like:
sampleset |
path_prefix |
chrom |
format |
---|---|---|---|
cineca |
/path/to/target_genomes/cineca_synthetic_subset |
pfile |
Save the file as samplesheet.csv
. See How to set up a samplesheet for more details.
2. Pick scores from the PGS Catalog#
Accessions#
Individual scores can be used by using Polygenic Score IDs that start with with
the prefix “PGS”. For example, PGS001229. The parameter --accession
accepts polygenic score IDs:
--pgs_id PGS001229
Multiple scores can be set by using a comma separated list:
--pgs_id PGS001229,PGS000802
Traits#
If you would like to calculate every polygenic score in the Catalog for a
trait, like coronary artery disease, then you can use the --trait_efo
parameter:
--trait_efo EFO_0001645
Multiple traits can be set by using a comma separated list.
Publications#
If you would like to calculate every polygenic score associated with a
publication in the PGS Catalog, you can use the --pgp_id
parameter:
--pgp_id PGP000001
Multiple traits can be set by using a comma separated list.
Note
PGS, trait, and publication IDs can be combined to calculate multiple polygenic scores.
3. Calculate!#
$ nextflow run pgscatalog/pgscalc \
-profile <docker/singularity/conda> \
--input samplesheet.csv \
--pgs_id PGS001229 \
--trait_efo EFO_0001645 \
--pgp_id PGP000001
Note
For more details about calculating multiple scores, see How to apply multiple scores in parallel