How do I prepare my input genomes?#
Target genome data requirements#
If you’d like to input WGS genomes, some extra preprocessing steps are required.
Only human chromosomes 1 – 22, X, Y, and XY are supported by the pipeline, although sex chromosomes are rarely used in scoring files.
If input data contain other chromosomes (e.g. patch regions) then the pipeline may complain loudly and stop calculating.
Supported file formats#
The following file formats are currently supported:
Plink 1 file set (
.bed / .bim / .fam)
Plink 2 file set (
.pgen / .pvar / .psam)
Compressed input is supported and automatically detected. For example, bgzip
compression of VCF files, or zstd compression of plink2 variant information
VCF from an imputation server#
plink2 --vcf <full_path_to_vcf.vcf.gz> \ --allow-extra-chr \ --chr 1-22, X, Y, XY \ -make-pgen --out <short name>_axy
Non-standard chromosomes/patches should not cause errors in versions >v2.0.0-alpha.3; however, they will be be filtered out from the list of variants available for PGS scoring.
VCF from WGS#
See PGScatalog/pgsc_calc#123 for discussion about tools to convert the VCF files into ones suitable for calculating PGS.