Imbalanced Dataset Sampler

In many machine learning applications, we often come across datasets where some types of data may be seen more than other types. Take identification of rare diseases for example, there are probably more normal samples than disease ones. In these cases, we need to make sure that the trained model is not biased towards the class that has more data. As an example, consider a dataset where there are 5 disease images and 20 normal images. If the model predicts all images to be normal, its accuracy is 80%, and F1-score of such a model is 0.88. Therefore, the model has high tendency to be biased toward the ‘normal’ class.

To solve this problem, a widely adopted technique is called resampling. It consists of removing samples from the majority class (under-sampling) and / or adding more examples from the minority class (over-sampling). Despite the advantage of balancing classes, these techniques also have their weaknesses (there is no free lunch). The simplest implementation of over-sampling is to duplicate random records from the minority class, which can cause overfitting. In under-sampling, the simplest technique involves removing random records from the majority class, which can cause loss of information.


In this repo, we implement an easy-to-use PyTorch sampler that is able to

  • rebalance the class distributions when sampling from the imbalanced dataset
  • estimate the sampling weights automatically
  • avoid creating a new balanced dataset
  • mitigate overfitting when it is used in conjunction with data augmentation techniques

Read More…

CVPR Workshop on FIFA World Cup


Boston Night Flight

Introducing Deepo

Given the current state of deep learning research and development, we may need to play around with different frameworks (e.g., PyTorch for research and fun, Caffe2 for edge device inference, etc). However, setting up all the deep learning frameworks to coexist and function correctly is tedious and time-consuming.

So I made Deepo, which contains a series of Docker images that

and their Dockerfile generator that

Read More…

Aloha Hawaii

2017-07-27 092607

Real-time Bilateral Filtering

Many users have become accustomed to reducing wrinkles, freckles, and various blemishes from human subjects for a more visually appealing image or video. This can be achieved by applying an edge-preserving filtering called bilateral filter. However, a vanilla bilateral filter typically has a high computational cost necessitating a powerful CPU / GPU to process images in real-time. So I had been looking for an efficient alternative algorithm, and finally found

Qingxiong Yang. Recursive bilateral filtering. European Conference on Computer Vision 2012.

that can achieve a good trade-off.

I made a lightweight C++ library for this algorithm, and obtained the following results (RecursiveBF):

Original Image RecursiveBF (18ms)
Original Image RecursiveBF (18ms)
Read More…

Get Lost in Moscow Metro

It is easy to get lost in Moscow Metro if you don’t know Russian and have never been to Moscow. But it’s fun as I’m gaining new experiences and challenging my boundaries in an unfamiliar land with unfamiliar people. BTW, Moscow subway stations are so beautiful and grandiose. It’s like visiting a museum.


Cycling from Amsterdam to Purmerend


Read More…