The Human Protein Atlas Image Competition on Kaggle

Just before the RSNA Penumonia Competition drew to a close, another interesting competition opened up, namely the Human Protein Atlas image classification challenge. Link: kaggle.com/c/human-protein-atlas-image-classification

The goal of the competition is to detect cell features from an image taken by a microscope. Around 30,000 512x512 images are provided which is the training set and around 11,000 images are provided as a test set. The labeled data is provided in a csv file.

I started off with the data exploration kernel to have a feel for the data. Then through the discussion forums I found a useful starter code that used transfer learning. Link: https://www.kaggle.com/mathormad/inceptionv3-baseline-lb-0-379

Instead of inception network, I decided to use ResNet50 network with the imagenet weights. I also tested results on both augmented data and the original one. The augmented data was giving me a boost of at least 0.05-0.06 on the public leaderboard. 

An interesting concept I learned from that code was the "warming up" idea for the transfer model. When a model is initialized, the weights are usually initialized with random/Xavier/He initialization. However, in transfer learning, we want to keep the imagenet loaded weights safe from this initialization. Therefore we initially declare all except the base transfer model layers trainable, train it for some epochs, and then make all layers trainable and start training on that.

I initially trained all the data on a 224x224 input size since it was the default size for using imagenet weights. After training for multiple epochs with it, I decided to use the 512x512 sized images.

There was however an issue. The network I was using was not going to have the same number of parameter in the layers if I decided to use the 512 sized images with the weights trained on the 224px ones. The trick was to insert an additional conv2d layer in between the conv2d and the flatten layer so that the number of dense parameters remained the same. I also named the model layers and used the by_name = True flag in load_weights function so that the weights loaded to the right layers.

I wanted to use the k-fold technique to get better parameters for the model. However due to lack of time, I only used it a couple of times to make sure the model was working correctly and getting decent results.

I'm still working on this competition and I'll update my results and other findings when I'm done.

Comments

Popular posts from this blog

First ever Kaggle competition - Pneumonia Detection

Densenet, Pneumonia Detection, Activation Maps and AUROC

Simple GUI for Model Inferences