R API • mixder

MixDeR can be run either through the associated Shiny app or using the below R functions.

There are three main functions with multiple optional arguments:
- Running mixture deconvolution and creating a GEDmatch PRO report
- Running mixture deconvolution and calculating metrics for a range of allele probability thresholds
- Running ancestry prediction

NOTE: There are several allele frequency data files stored within MixDeR for your use. 1000G global is the default when using the same frequency file for both contributors (twofreqs=FALSE). The following datasets are available for the freq_both, freq_major and freq_minor arguments and should be referenced as indicated:
- 1000 Genomes global (1000g_global)
- gnomAD global (gnomad)
- 1000 Genomes AFR (afr)
- 1000 Genomes AMR (amr)
- 1000 Genomes EAS (eas)
- 1000 Genomes EUR (eur)
- 1000 Genomes SAS (sas)
Of course, a custom allele frequency file is allowed, requiring only the Path to the file.

NOTE x 2: Using the R API, there are two options to specify samples:
1. A sample manifest file can be used (as required by the Shiny app) listing the sample ID in the first column (and, if applicable, the replicate ID in the second column) for multiple samples to be run consecutively (using the sample_manifest argument)
2. A single sample ID (and, if applicable, a replicate ID) can be specified (using the sample and replicate arguments)
Please see below for examples.

Running mixture deconvolution and creating GEDmatch PRO report(s)

Full reference for this function, including default settings, can be found here.

The following function runs an unconditioned mixture deconvolution and creates a report using the default settings (See Running Mixture Deconvolution for more information about the defaults), using the 1000 Genomes Global allele frequency data:

run_mixder_report(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name")

As mentioned above, a single sample ID can be specified instead of a sample manifest:

run_mixder_report(sample = "Sample01", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name")

To run a conditioned mixture deconvolution using the default settings and the 1000 Genomes Global allele frequency data, conditioning on both Ref1 and Ref2, use the following function:

run_mixder_report(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", uncond=FALSE, cond=TRUE, refs=list("Ref1", "Ref2"), refpath="/path/to/references_dir/")

To utilize a custom allele frequency file for both the major and minor contributor:

run_mixder_report(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", freq_both="/path/to/frequency_file.csv")

To utilize separate allele frequency data for the major and minor contributor:

run_mixder_report(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", twofreqs=TRUE, freq_major="1000G_EUR", freq_minor="1000G_SAS")

To change the default settings, such as the Allele 1 and Allele 2 probability thresholds and the dynamic and static ATs:

run_mixder_report(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", A1_threshold=0.95, A2_threshold=0.50, staticAT=15, dynamicAT=0.03)

To not run a new mixture deconvolution, but to create a GEDmatch PRO report for a previously run deconvolution stored in the run_decon_output directory:

run_mixder_report(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="run_decon_output", mixdeconv=FALSE)

Running mixture deconvolution and calculating validation metrics

Full reference for this function, including default settings, can be found here.

The options are similar as the run_mixder_report function. The following function runs an unconditioned mixture deconvolution and calculates metrics using the default settings (See Running Mixture Deconvolution for more information about the defaults), using the 1000 Genomes Global allele frequency data. Specifying the references directory, as well as the assumed major and minor contributor, are required for this function:

run_mixder_metrics(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", major="Ref1", minor="Ref2", refpath="/path/to/references_dir/")

As mentioned above, a single sample ID can be specified instead of a sample manifest:

run_mixder_metrics(sample = "Sample01", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", major="Ref1", minor="Ref2", refpath="/path/to/references_dir/")

Similar as the run_mixder_report function, a conditioned deconvolution can be run:

run_mixder_metrics(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", major="Ref1", minor="Ref2", refpath="/path/to/references_dir/", uncond=FALSE, cond=TRUE, refs=list("Ref1", "Ref2"))

Changing the range of tested allele 1 and allele 2 probability thresholds:

run_mixder_metrics(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_dir_name", major="Ref1", minor="Ref2", refpath="/path/to/references_dir/", A1min=0.5, A1max=0.99, A2min=0.4, A2max=0.70)

Running ancestry prediction

Full reference for this function, including default settings, can be found here.

Again, the options are similar to run_mixder_report and run_mixder_metrics with additional required inputs: the set of SNPs to use for PCA (either ancestry or all) and the population groups to use for PCA (either superpopulations or subpopulations). The following function runs an unconditioned mixture deconvolution and performs PCA using the default settings (See Running Mixture Deconvolution for more information about the defaults), using the 1000 Genomes Global allele frequency data.

run_mixder_ancestry(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_ancestry", snps="all", pcagroups="Superpopulations")

As mentioned above, a single sample ID can be specified instead of a sample manifest:

run_mixder_ancestry(sample = "Sample01", sample_reports="/path/to/sample_reports_dir/", output="output_ancestry", snps="all", pcagroups="Superpopulations")

To run a conditioned deconvolution for ancestry prediction, using only ancestry SNPs and subpopulations for PCA:

run_mixder_ancestry(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_ancestry", uncond=FALSE, cond=TRUE, refpath="/path/to/references_dir/", refs=list("Ref1"), snps="ancestry", pcagroups="Subpopulations")

Same as above, the mixture deconvolution default settings can be changed:

run_mixder_ancestry(sample_manifest="/path/to/samplemanifest.csv", sample_reports="/path/to/sample_reports_dir/", output="output_ancestry", dynamicAT=0.03, staticAT=15, A1_threshold=0.95, A2_threshold=0.50, bins=15, snps="all", pcagroups="Superpopulations")