6 Inputs

The following input files should be prepared for a pipeline run and placed in the input directory as specified by the --input-dir option of the workflow.

  1. a genotype report file (must have the file extension .txt) wich has the genotyping base-calls and it is usually provided by the genotyping service. Alternatively it can be generated by GenomeStudio from the raw genotype data files (*.idat files). It is typically called gtReport.txt. Example of the expected format:
[Header]
GSGT Version  2.0.4
Processing Date     6/13/2019 3:03 PM
Content     InfiniumOmni2-5-8v1-3_A1.bpm
Num SNPs    2372784
Total SNPs  2372784
Num Samples   5 
Total Samples 5
[Data]
LIBD1734_190528     LIBD1736_190528     LIBD1776_190528     LIBD1790_190528     LIBD1836_190528
AA  AA  AA  AA  AA    
  1. a Plink map file (must have file extension .map) as described in https://www.cog-genomics.org/plink/1.9/formats#map, which has the genomic coordinates for each SNP in the genotyping array. This file should match the manifest file for the SNP array used and usually its name can be seen in the header of the gtReport.txt file. In the example above, the manifest file name is seen in the Content line as InfiniumOmni2-5-8v1-3_A1 which is the identifier of the SNP array used. The script gt_prep.pl which is provided with this pipeline can be used to generate

  2. a sample sheet (CSV format, must have the file extension .csv) with the following columns:

  • a Sample.ID (required) : column corresponding to IDs as found in the genotype report.
  • a BrNum (required) : column with IDs specific to your study (in our case brain numbers). The pipeline will drop anything in this column that is empty (or NA)
  • a Sex column (optional) : can be mission or also filled with 0s if the information is not available.