Required Files • mixder

Mixture Kintelligence SNP profiles
This can be in the form of the UAS Sample Report or a TSV file with the below format:

Marker	Allele	Reads
rs12615742	T	0
rs12615742	C	134
rs16885694	G	43
rs16885694	A	63

The files should be tab delimited and should be named as a .tsv file, such as: SampleID.tsv.

If using the Shiny app, you must specify the folder containing these SNP files. Multiple samples (with multiple files each) can be in the same folder. Additional files may be present in the folder and will be ignored by MixDeR.

MixDeR will divide the entire Kintelligence dataset into more manageable sets (organized by total SNP read depth) to run through EFM (ideal for best performance). The user may specify how many sets the program will use (see below); the default is 10 sets. The user must then specify how many sets are provided so MixDeR knows how many files to process per sample.

The default is for MixDeR to use previously-created SNP sets, if present in the specified input folder. If this option is unselected, MixDeR will create new SNP sets files, overwriting any previously made files.

The sample manifest
This file lists the Sample IDs of the files to run. The SampleID is extracted from the Sample Name field in the Sample Report. The columns names do not matter, just the order and both columns must be present even if no replicates are included. If a single sample is run, only the single ID needs to be in the first column, the second column should be left blank. If a second sample is to be run in replicate, the replicate ID should be listed in the second column.

SampleID	ReplicateID
Sample01a
Sample01a	Sample01b

Other files which may be required

Allele frequency file

MixDeR provides global population allele frequencies for Kintelligence SNPs from either 1000 Genomes Phase 3 dataset or the gnomAD v4 dataset. However, it is ideal to use allele frequencies derived from the population that closely matches the contributor(s) of interest. Therefore, MixDeR provides the user the opportunity to upload a different allele frequency file. EuroForMix requires the below format for allele frequency files, for all SNPs with each SNP as its own column:

Allele	rs6690515	rs424079	rs2055204
A	0.122837	0.64677	0.501441
C	NA	0.35323	NA
G	0.877163	NA	0.498559
T	NA	NA	NA

Given the difficulty of formatting the data as such, MixDeR will create this format for the user if the user provides the frequency data in a CSV file with the following format (NOTE: the column names MUST match below; the order of columns and additional columns will not affect it).

SNP	Ref	Alt	Ref_AF	Alt_AF
rs6690515	A	C	0.35323	0.64677
rs424079	T	C	0.122837	0.877163
rs2055204	G	A	0.501441	0.498559

MixDeR provides the option to use the same allele frequency file for both the major and minor contributor or select different allele frequency files for the major and minor contributor. If two different frequency files are selected, MixDeR will run EFM twice, once using the allele frequency file for the major contributor (and extracting the inferred genotypes for the major contributor from those results) and once using the allele frequency file for the minor contributor (and extracting the inferred genotypes for the minor contributor from those results).

Reference Genotypes

If calculating validation metrics or performing a conditioned deconvolution, the reference genotypes are required.

There are two options for providing reference genotypes. MixDeR accepts the UAS Sample Report, stored in a separate folder. A second option is to provide a single CSV file containing all references named EFM_references.csv with the following format:

Sample.Name	Marker	Allele1	Allele2
Sample01	rs12615742	T	C
Sample01	rs16885694	G	G

The user must provide the folder containing either the Sample Reports or the CSV reference file.
NOTE: MixDeR first searches for the CSV file in the provided folder. If there are additional wanted references not contained in this file, please remove the CSV file and run MixDeR again. MixDeR creates the CSV file containing genotypes from the Sample Reports within the provided folder.