top of page

Data Set

To evaluate different models' performance on correlated image data, we constructed two datasets -- birds and celebrity faces.
There are roughly 14,000 images in the training set and 100 images in the testing set for Birds dataset. For the Celebrity dataset, there are around 20,000 images in the training set and 1,700 images in the testing set. Within the training set, we performed a train:validation:test split to train our models.

Dataset 1: Birds

Birds consist of two types of birds: sea birds and land birds. The correlated data in this dataset is the background. As displayed in the first column of the image below, sea birds tend to appear in backgrounds that primarily consist of sky or water. Similarly, as displayed in the middle column of the image below, land birds tend to appear in backgrounds that primarily consist of trees or grass. These two types of correlated birds images constitute the training set for the Birds dataset and were crawled from online image groups on Flickr. The testing set was hand-picked from the image groups on Flickr, which include seabirds appear on a land background and land birds flying over a sky. Typical testing example images can be found in the right column of the image below.

birds-example.png

Dataset 2: CelebA

This dataset was built based on CelebA, which contains 202k face images, including 40 binary attributes for each image. In this dataset, we mainly focused on classification for gender. The correlated data is hair color and gender, where most of the male faces tend to have dark hair colors and most of the images with blonde hair tend to be female. Typical examples can be found in the left and middle columns of the image below. Such correlated image data accounts for the training set. The testing set was crawled from the CelebA dataset using the provided binary attributes annotations of each image. Images with positive blonde hair and male attribute were added to the testing set. Typical testing examples are shown in the right column of the image below.

teaser_edited.jpg
Our Dataset: Careers
  • Facebook
  • Twitter
  • LinkedIn

©2021 by CS 766 Spring 2021 Final Project. Proudly created with Wix.com

bottom of page