Required Files
Required_Files.Rmd-
Mixture Kintelligence SNP profiles
This can be in the form of the UAS Sample Report or a TSV file with the below format:
| Marker | Allele | Reads |
|---|---|---|
| rs12615742 | T | 0 |
| rs12615742 | C | 134 |
| rs16885694 | G | 43 |
| rs16885694 | A | 63 |
The files should be tab delimited and should be named as a
.tsv file, such as: SampleID.tsv.
If using the Shiny app, you must specify the folder containing these SNP files. Multiple samples (with multiple files each) can be in the same folder. Additional files may be present in the folder and will be ignored by MixDeR.
MixDeR will divide the entire Kintelligence dataset into more manageable sets (organized by total SNP read depth) to run through EFM (ideal for best performance). The user may specify how many sets the program will use (see below); the default is 10 sets. The user must then specify how many sets are provided so MixDeR knows how many files to process per sample.
The default is for MixDeR to use previously-created SNP sets, if present in the specified input folder. If this option is unselected, MixDeR will create new SNP sets files, overwriting any previously made files.
-
The sample manifest
This file lists the Sample IDs of the files to run. TheSampleIDis extracted from theSample Namefield in the Sample Report. The columns names do not matter, just the order and both columns must be present even if no replicates are included. If a single sample is run, only the single ID needs to be in the first column, the second column should be left blank. If a second sample is to be run in replicate, the replicate ID should be listed in the second column.
| SampleID | ReplicateID |
|---|---|
| Sample01a | |
| Sample01a | Sample01b |
Other files which may be required
Allele frequency file
MixDeR provides global population allele frequencies for Kintelligence SNPs from either 1000 Genomes Phase 3 dataset or the gnomAD v4 dataset. However, it is ideal to use allele frequencies derived from the population that closely matches the contributor(s) of interest. Therefore, MixDeR provides the user the opportunity to upload a different allele frequency file. EuroForMix requires the below format for allele frequency files, for all SNPs with each SNP as its own column:
| Allele | rs6690515 | rs424079 | rs2055204 |
|---|---|---|---|
| A | 0.122837 | 0.64677 | 0.501441 |
| C | NA | 0.35323 | NA |
| G | 0.877163 | NA | 0.498559 |
| T | NA | NA | NA |
Given the difficulty of formatting the data as such, MixDeR will create this format for the user if the user provides the frequency data in a CSV file with the following format (NOTE: the column names MUST match below; the order of columns and additional columns will not affect it).
| SNP | Ref | Alt | Ref_AF | Alt_AF |
|---|---|---|---|---|
| rs6690515 | A | C | 0.35323 | 0.64677 |
| rs424079 | T | C | 0.122837 | 0.877163 |
| rs2055204 | G | A | 0.501441 | 0.498559 |
MixDeR provides the option to use the same allele frequency file for both the major and minor contributor or select different allele frequency files for the major and minor contributor. If two different frequency files are selected, MixDeR will run EFM twice, once using the allele frequency file for the major contributor (and extracting the inferred genotypes for the major contributor from those results) and once using the allele frequency file for the minor contributor (and extracting the inferred genotypes for the minor contributor from those results).
Reference Genotypes
If calculating validation metrics or performing a conditioned deconvolution, the reference genotypes are required.
There are two options for providing reference genotypes. MixDeR
accepts the UAS Sample Report, stored in a separate folder. A second
option is to provide a single CSV file containing all references named
EFM_references.csv with the following format:
| Sample.Name | Marker | Allele1 | Allele2 |
|---|---|---|---|
| Sample01 | rs12615742 | T | C |
| Sample01 | rs16885694 | G | G |
The user must provide the folder containing either the Sample Reports
or the CSV reference file.
NOTE: MixDeR first searches for the CSV file in the provided folder. If
there are additional wanted references not contained in this file,
please remove the CSV file and run MixDeR again. MixDeR creates the CSV
file containing genotypes from the Sample Reports within the provided
folder.