Training neural networks with CBO#
This notebook shows how to train a simple neural network on the MNIST dataset. We employ Pytorch, which creates a very convenient machine learning environment. In particular, we will see how CBX can run on the GPU.
We start by loading the necessary packages.
[ ]:
%load_ext autoreload
%autoreload 2
import cbx as cbx
from cbx.dynamics.cbo import CBO
import torch
import torch.nn as nn
import torchvision
import cbx.utils.resampling as rsmp
Load the data: MNIST#
We load the train and test data. In this case we use the MNIST dataset, which we assume to be available. You can specify the path with the variable data_path
to point to the right directory.
[ ]:
data_path = "../../../../datasets/" # This path directs to one level above the CBX package
transform = torchvision.transforms.ToTensor()
train_data = torchvision.datasets.MNIST(data_path, train=True, transform=transform, download=False)
test_data = torchvision.datasets.MNIST(data_path, train=False, transform=transform, download=False)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64,shuffle=True, num_workers=0)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64,shuffle=False, num_workers=0)
Specify the device#
We now specify the device to run everything on. If cuda is available, we perform most of the calculations on the GPU!
[ ]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
Load model type#
We now load the model that we want to employ in the following. The variable model_class
specifies the kind of network we want to use, in this case a Perceptron with one hidden layer. Since
[ ]:
from models import Perceptron
model_class = Perceptron
Model initialization#
We now initialize the parameters that will be used in the optimization. First we decide how many particles, we want to use in the following, in this case N=50
.
Initializing the weights#
We initialize the parameters, by creating a list containing N
different initializations: models = [model_class(sizes=[784,100,10]) for _ in range(N)]
. This list is then transformed into a torch tensor w
of shape (50, d) with the function flatten_parameters
, where d
is the number of trainable parameters per network. We save further properties, like the names and shapes of each parameter into the variable pprop
, which later allows us to perform the inverse operation of
flatten_parameters
.
Do we save the whole list models
?#
The important thing to realize in the following is, that we do not work with the list models
anymore. We only created it to initialize the parameters. The only thing that is updated is the tensor w
that stores the flattened parameters. Every time, we want to evaluate the ensemble, we make use of the function `functional_call
<https://pytorch.org/docs/stable/generated/torch.func.functional_call.html>`__, which takes the tensor w
together with one realization of the model and
applies it. That is why set model = models[0]
in order to have one realization.
[ ]:
from cbx.utils.torch_utils import flatten_parameters, get_param_properties, eval_losses, norm_torch, compute_consensus_torch, standard_normal_torch, eval_acc, effective_sample_size
N = 50
models = [model_class(sizes=[784,100,10]) for _ in range(N)]
model = models[0]
pnames = [p[0] for p in model.named_parameters()]
w = flatten_parameters(models, pnames).to(device)
pprop = get_param_properties(models, pnames=pnames)
The objective function#
We now define the functions that we want to optimize. For a single particle \(w_i\) it is defined as
where \(\mathcal{T}\) is our training set, \(N_{w_i}\) denotes the neural net parametrized by \(w_i\) and \(\ell\) is some loss function. We usually do not evaluate the loss on the whole training set, but rather on so called mini-batches \(B\subset\mathcal{T}\) which is incorporated in the objective function below.
[ ]:
class objective:
def __init__(self, train_loader, N, device, model, pprop):
self.train_loader = train_loader
self.data_iter = iter(train_loader)
self.N = N
self.epochs = 0
self.device = device
self.loss_fct = nn.CrossEntropyLoss()
self.model = model
self.pprop = pprop
self.set_batch()
def __call__(self, w):
return eval_losses(self.x, self.y, self.loss_fct, self.model, w[0,...], self.pprop)
def set_batch(self,):
(x,y) = next(self.data_iter, (None, None))
if x is None:
self.data_iter = iter(self.train_loader)
(x,y) = next(self.data_iter)
self.epochs += 1
self.x = x.to(self.device)
self.y = y.to(self.device)
Set up CBX Dynamic#
We now set up the dynamic. First we set some parameters.
[ ]:
kwargs = {'alpha':50.0,
'dt': 0.1,
'sigma': 0.1,
'lamda': 1.0,
'max_it': 200,
'verbosity':0,
'batch_args':{'batch_size':N},
'check_f_dims':False}
How to incorporate torch in CBX?#
The interesting question is now, how we can incorporate torch into CBX. Usually, CBX assumes numpy
as the underlying array packages, however this can be modified. If we only work on the CPU, we would not have to change too much because then numpy
and torch
can use the same underlying memory and numpy
s ufunctions could act on torch tensors. This does not work, if we want to work on the GPU!
Where is numpy
actually relevant?#
Since standard CBO does not require many advanced array operations, the question arises, where it actually makes a difference what the ensemble x
is? We list the most important aspects below:
The ensemble
x
must implement basic array operations, such as+,-,*,\
, scalar multiplication, some broadcasting rules andnumpy
-style indexing.Copying: in certain situations it is important to copy the ensemble. In
numpy
we usenp.copy
, the torch analogue istorch.clone
. TheParticleDynmaic
allows to specify which function should be used with the keyword argumentcopy=...
.Generating random numbers: All noise methods in
cbx
call an underlying random number generator. In fact most of the time, we want to obtain an array of a certain size, with the entries being distributed according to the standard normal distribution. Therefore, theParticleDynmaic
allows to specify a callablenormal=...
that is used, whenever we want to sample. In our case employ a wrapper class for torch.normal, defined in the filecbx_torch_utils
.Computing the consensus: The consensus computation also requires some modification. Most notably, for numerical stability we employ a
`logsumexp
<https://en.wikipedia.org/wiki/LogSumExp>`__ function, which has to be replaced by the torch variant. We can specify which function to use, with the keywordcompute_consensus=...
and use the function defined incbx_torch_utils
[ ]:
f = objective(train_loader, N, device, model, pprop)
resampling = rsmp.resampling([rsmp.loss_update_resampling(wait_thresh=40)])
dyn = CBO(f, f_dim='3D', x=w[None,...], noise='anisotropic',
norm=norm_torch,
copy=torch.clone,
sampler=standard_normal_torch(device),
compute_consensus=compute_consensus_torch,
post_process = lambda dyn: resampling(dyn),
**kwargs)
sched = effective_sample_size(maximum=1e7, name='alpha')
Train the network#
After we set up everything, we can now start the training loop :)
[ ]:
e = 0
while f.epochs < 10:
dyn.step()
sched.update(dyn)
f.set_batch()
if e != f.epochs:
e = f.epochs
print(30*'-')
print('Epoch: ' +str(f.epochs))
acc = eval_acc(model, dyn.best_particle[0,...], pprop, test_loader)
print('Accuracy: ' + str(acc.item()))
print(30*'-')
[ ]: