README.md 6.85 KB
Newer Older
psturm's avatar
psturm committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# CNNs for Structural MRI Data
This repository contains a flexible set of scripts to run convolutional neural networks (CNNs) on structural brain images.  
It was written using python 3.6.3 and tensorflow 1.4.0. It requires tensorflow (and all dependencies).  
It is run using: `python -m run_scripts.runCustomCNN` from the code directory. The script also takes the following options:
```
required arguments:
  --gpuMemory GPUMEMORY
                        A float between 0 and 1. The fraction of available
                        memory to use.
  --numSteps NUMSTEPS   The number of steps to train for.
  --scale SCALE         The scale at which to slice dimensions. For example, a
                        scale of 2 means that each dimension will be devided
                        into 2 distinct regions, for a total of 8 contiguous
                        chunks.
  --type TYPE           One of: traditional, reverse
  --summaryName SUMMARYNAME
                        The file name to put the results of this run into.
  --data DATA           One of: PNC, PNC_GENDER, ABIDE1, ABIDE2, ABIDE2_AGE
optional arguments: 
20
21
  --poolType POOLTYPE
						The type of the pooling layer used inside the network.
hyhuang's avatar
hyhuang committed
22
						One of: MAX, AVERAGE, STRIDED, NONE
psturm's avatar
psturm committed
23
24
25
26
  --sliceIndex SLICEINDEX
                        Set this to an integer to select a single brain region
                        as opposed to concatenating all regions along the
                        depth channel.
hyhuang's avatar
hyhuang committed
27
  --align ALIGN         Set to 1 to align channels, maximizes the intersection.
hyhuang's avatar
hyhuang committed
28
					    Obsolete feature, will lead to a worse performance.
psturm's avatar
psturm committed
29
30
  --numberTrials NUMBERTRIALS
                        Number of repeated models to run.
hyhuang's avatar
hyhuang committed
31
32
  --padding PADDING     
						Set this to an integer to crop the image to the brain
psturm's avatar
psturm committed
33
                        and then apply `padding` amount of padding.
hyhuang's avatar
hyhuang committed
34
						Obsolete feature, will lead to a worse performance.
psturm's avatar
psturm committed
35
36
37
38
39
  --batchSize BATCHSIZE
                        Batch size to train with. Default is 4.
  --pheno PHENO         Specify 1 to add phenotypics to the model.
  --validationDir VALIDATIONDIR
                        Checkpoint directory to restore the model from.
hyhuang's avatar
hyhuang committed
40
41
						If not specified, program will check the default
						directory for stored parameters.
psturm's avatar
psturm committed
42
43
44
45
46
47
48
49
50
51
52
53
54
  --regStrength REGSTRENGTH
                        Lambda value for L2 regularization. If not specified,
                        no regularization is applied.
  --learningRate LEARNINGRATE
                        Global optimization learning rate. Default is 0.0001.
  --maxNorm MAXNORM     Specify an integer to constrain kernels with a maximum
                        norm.
  --dropout DROPOUT     The probability of keeping a neuron alive during
                        training. Defaults to 0.6.
  --dataScale DATASCALE
                        The downsampling rate of the data. Either 1, 2 or 3.
                        Defaults to 3.
  --pncDataType PNCDATATYPE
hyhuang's avatar
hyhuang committed
55
                        One of AVG, MAX, NAIVE, POOL_MIX, COMBINE, CONCAT. Defaults to AVG. 
56
						If set, dataScale cannot be specified.
psturm's avatar
psturm committed
57
  --listType LISTTYPE   Only valid for ABIDE and ADHD. One of strat or site.
hyhuang's avatar
hyhuang committed
58
  --depthwise DEPTHWISE
hyhuang's avatar
hyhuang committed
59
						Set to 1 to use depthwise convolutions for the entire network. (Untested Feature)
hyhuang's avatar
hyhuang committed
60
61
  --skipConnection SKIPCONNECTION
						Set to 1 to allow skip connection layer, add residuals
hyhuang's avatar
hyhuang committed
62
						to the network (like ResNet). (Will lead to a worse performance.)
63
64
65
66
67
68
  --maxRatio MAXRATIO
						Ratio of max pooling in the pool_mix augmentation. Default to 0.25.
						The ratio should fall within the range [0.05, 0.75] and also be a multiple of 0.05.
						Useful only when pool_mix type is selected.
  --augRatio AUGRATIO
						Ratio of augmented images versus pure average images in the pool_mix augmentation. Default to 2. Useful only when pool_mix type is selected.
hyhuang's avatar
hyhuang committed
69
70
  --testType TESTTYPE
						One of AVG, MAX. Type of validation and test file preprocessing setting used in concat augmentation. Default to AVG.
hyhuang's avatar
hyhuang committed
71
72
73
74
75
76
77
  --augment AUGMENT     
						One of FLIP, TRANSLATE. Type of standard augmentation. Default to None.
  --origSize
						Size of the original sample before augmentation. One of 100, 200, 300.
						If None, then all samples are used. Default to None
  --pretrained
						Set to 1 to use the pretrained model to test on the UKBiobank Dataset.
psturm's avatar
psturm committed
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
```
  
  The scripts assume that you have the following directories, which you will have to create yourself:   
  `brain_age_prediction/summaries/`   
  `brain_age_prediction/checkpoints/`   
It also requires that the data directories exist in the config files, and that those data directories actually contain data. 

  
  
  The code itself is organized into several directories.
  
### model/
  This directory contains two files related to building the model, both of  
  which can be used independently of the rest of the repository: `buildCommon.py` and `buildCustomCNN.py`.  
  The former contains wrapper functions around tensorflow's 3D model-building functions  
  to create 3D convolutional layers, fully connected layers, batch normalization and pooling layers.  
  The latter contains a single function that aggregates all of the model-building functions  
  into a single, flexible CNN that is largely built by the passed-in parameters.   
   
  
### engine/
  This directory contains most of the heavy-lifting of the code base. `trainCustomCNN.py`  
  takes in options from the command line, and builds the model and loads the datasets based  
  on those parameters. `trainCommon.py` is a file that contains flexible functions related to  
  automatically training a model given a gradient update operation. It trains the model for a  
  specified number of iterations, saving the model that did best on the validation set while training.  
  It also contains functions to output visualizations of arbitrary operations to tensorboard summaries.  
  
### data_scripts/
  `DataSetNPY.py` contains a class that loads in and produces batches of .npy  
  files, which is useful if you have matrix data with arbitrary dimensions. 
  `DataSetBIN.py` contains a class that loads in and produces batches of binary files,  
  which is faster than .npy files, but less flexible. 
  
### run_scripts/
  `runCustomCNN.py` is a simple script that runs the engine scripts.
  
### utils/
  This file contains several helper files.  
  `args.py` takes in command-line arguments.  
  `config.py` reads the config.json file in the code directory.  
  `patches.py` does regional segmentation as described in the paper.  
  `saveModel.py` restores tensorflow models from a given directory.  
  `sliceViewer.py` is a class that allows one to view numpy matrices in dimensions higher than 2D.
  
### placeholders/
  `shared_placeholders.py` contains several functions to return placeholders  
  for fed-in data.
  
### archived/
  This directory contains a wealth of files that were used to run previous experiments.  
  These files are now no longer maintained or have been re-written in an updated script.