Skip to content

Image classification

Image classification models classify an image into one out of several categories or classes, based on the image content (e.g. "cat" or "dog").

Input formats

Numpy arrays

Each data set should be a list of two elements: The first element is a numpy array of all images of shape (number of images, color channels (1 or 3), height, width). The second element is an array of labels (as integer indices).

Example:

train_images = np.zeros(32, 3, 256, 256)  # 32 images with 3 color channels and size 256x256
train_labels = np.zeros(32, dtype=int)

traintool.train(..., train_data=[train_images, train_labels])

Files

Image files should be arranged in one folder per class, similar to this:

train
+-- dogs
|   +-- funny-dog.jpg
|   +-- another-dog.png
+-- cats
|   +-- brown-cat.png
|   +-- black-cat.png
...

Then simply pass the directory path to the train function:

traintool.train(..., train_data="./train")

Scikit-learn models

These models implement simple classification algorithms that should train in a reasonable amount of time. Note that they are not GPU-accelerated so they might still take quite long with large datasets.

Preprocessing: Image files are first loaded to a size of 28 x 28. All images (numpy or files) are then flattened and scaled to mean 0, standard deviation 1 (based on the train set).

Config parameters:

  • num_samples: Set the number of samples to train on. This can be used to train on a subset of the data. Defaults to None (i.e. train on all data).
  • num_samples_to_plot: Set the number of samples to plot to tensorboard for each dataset. Defaults to 5.
  • All other config parameters are forwarded to the constructor of the sklearn object

Models:

PyTorch models

These models implement deep neural networks that can give better results on complex datasets. They are GPU-accelerated if run on a machine with a GPU.

Preprocessing: All images (numpy or files) are rescaled to 256 x 256, then center-cropped to 224 x 224, MEAN STD

Config parameters:

  • num_classes: The number of classes/different output labels (and therefore number of output neurons of the network). Defaults to None, in which case it will be automatically inferred from the data.
  • num_samples: Set the number of samples to train on. This can be used to train on a subset of the data. Defaults to None (i.e. train on all data).
  • num_samples_to_plot: Set the number of samples to plot to tensorboard for each dataset. Defaults to 5.
  • pretrained: Whether to use pretrained weights for the models (trained on ImageNet). Note that this requires that there are 1000 classes (the ImageNet classes). Defaults to False.

Models:

More information on the torchvision docs.