6 Inputs
The following input files should be prepared for a pipeline run and placed in the input directory as specified by the --input-dir
option of the workflow.
- a genotype report file (must have the file extension
.txt
) wich has the genotyping base-calls and it is usually provided by the genotyping service. Alternatively it can be generated by GenomeStudio from the raw genotype data files (*.idat
files). It is typically called gtReport.txt. Example of the expected format:
[Header]
GSGT Version 2.0.4
Processing Date 6/13/2019 3:03 PM
Content InfiniumOmni2-5-8v1-3_A1.bpm
Num SNPs 2372784
Total SNPs 2372784
Num Samples 5
Total Samples 5
[Data]
LIBD1734_190528 LIBD1736_190528 LIBD1776_190528 LIBD1790_190528 LIBD1836_190528
AA AA AA AA AA
a Plink map file (must have file extension
.map
) as described in https://www.cog-genomics.org/plink/1.9/formats#map, which has the genomic coordinates for each SNP in the genotyping array. This file should match the manifest file for the SNP array used and usually its name can be seen in the header of the gtReport.txt file. In the example above, the manifest file name is seen in theContent
line asInfiniumOmni2-5-8v1-3_A1
which is the identifier of the SNP array used. The scriptgt_prep.pl
which is provided with this pipeline can be used to generatea sample sheet (CSV format, must have the file extension
.csv
) with the following columns:
- a Sample.ID (required) : column corresponding to IDs as found in the genotype report.
- a BrNum (required) : column with IDs specific to your study (in our case brain numbers). The pipeline will drop anything in this column that is empty (or NA)
- a Sex column (optional) : can be mission or also filled with 0s if the information is not available.