How to get a copy of the standardised genomic data

How to get a copy of the standardised genomic data#

pgsc_calc works with genomic data in a lot of formats. An important part of the workflow is to standardise the genomic data for matching against scoring files. These data can be large, and aren’t published by default to the results directory (i.e. with the scores and other output of the workflow). If you would like a copy of the standardised genomic data, set the following parameter at runtime:

--copy_genomes

Warning

This can:
  • Give you a very big results directory

  • Make pgsc_calc slower, because it will spend extra time copying data

Note

  • If you used PLINK2 format input, you will get back PLINK2 format data

  • If you used PLINK1 format input, you will get back PLINK1 format data

  • If you used a VCF file, you will get back PLINK2 format data

You might also find it useful to explore the nextflow working directory to check PLINK logs or other intermediate output. By default, intermediate files are kept in the work/ directory. To find files, check the terminal output for directory labels or use the find utility, e.g.:

$ nextflow run pgscatalog/pgsc_calc -profile test,docker
$ find work/ -name '*.bim'
work/stage/f7/d9451685e9db7e6c19ec48490ef275/cineca_synthetic_subset.bim
work/27/ef1706eb5c99316b6e44200f2c1ba1/cineca_synthetic_subset.bim
work/27/ef1706eb5c99316b6e44200f2c1ba1/cineca_synthetic_subset_22.bim
work/5b/bc29aaf02d3695edfabb11488faf69/cineca_synthetic_subset_22.bim

Working directory labels (e.g. 27/ef17..) will probably be different on your computer.