How to use a custom scoring file

How to use a custom scoring file#

You might want to use a scoring file that you’ve developed using different genomic data, or a scoring file somebody else made that isn’t published in the PGS Catalog.

Custom scoring files need to follow a specific format. The entire process of using a custom scoring file is described below.

1. Samplesheet setup#

Set up a samplesheet as described in: How to set up a samplesheet.

2. Scorefile setup#

Setup your scorefile in a spreadsheet by concatenating the variant-information to a minimal header in the following format:

Header:

#pgs_name=metaGRS_CAD
#pgs_id=metaGRS_CAD
#trait_reported=Coronary artery disease
#genome_build=GRCh37

Variant-information:

chr_name

chr_position

effect_allele

other_allele

effect_weight

1

2245570

G

C

-2.76009e-02

8

26435271

T

C

1.95432e-02

10

30287398

C

T

1.82417e-02

Tip

If you’re having trouble getting your scorefile working, see the example we use in our automatic tests

Save the file as scorefile.txt. The file should be in tab separated values (TSV) format. Column names are defined in the PGS Catalog scoring file format v2.0, and key metadata (e.g. genome_build should be specificied in the header) to ensure variant matching and/or liftover is consistent with the target genotyping data. Example scorefile templates are available in the calculator repository. Scorefiles can be compressed with gzip if you would like to save storage space (e.g. scorefile.txt.gz).

This how to guide describes a simple scoring file. More complicated scoring files need extra work:

Note

The other_allele column is optional but recommended

3. Calculate!#

Set the path of the custom scoring file with the --scorefile parameter:

$ nextflow run pgscatalog/pgscalc \
    -profile <docker/singularity/conda> \
    --input samplesheet.csv \
    --scorefile scorefile.txt

Congratulations, you’ve now calculated some scores using your custom scoring file! 🥳

After the workflow executes successfully, the calculated scores and a summary report should be available in the results/ directory by default. If you’re interested in more information, see pgsc_calc Outputs & report.

If the workflow didn’t execute successfully, have a look at the Troubleshooting section.