Image Database Visualization
This post will cover some recent developments I've made to tools for visualizing image databases and image attributes. I've worked on this project on and off for some time and recently decided to revisit it because of its application to a work related project I'm currently developing. A few years ago I began exploring how image attributes like color could be displayed as a 3D point cloud. The initial explorations were relatively simple and relied on palletizing the colors in a given image and quantifying the colors based on their level of dominance in the image. Each color was represented in 3D space by using the RGB values as XYZ coordinates, where R=X, G=Z and B=Y. Each point was given a radius based on the percentage of pixels in a given image which contain the color. When several images were processed in this way, a colorful point cloud consisting of thousands of unique colors was generated.
Recently, I've been developing a suite of tools for processing and analyzing batches of images to build a content based image retrieval model which also identifies notable trends across the batch of images. Right now, the image trends I'm focused on pertain color pallets, hence the connection back to my early explorations. Recent iterations apply the idea of representing color in 3D space to representing entire images in 3D space using metrics derived from the color attributes of the image. Specifically in this case, color moments. Color moments provide a concise, scale and rotation invariant representation of an image with a series of metrics that quantify the distribution of color in an image. Here, color moments are used to generate the coordinates of a point in 3D space where an image can be placed. Images with similar color distributions will appear in similar areas in 3D space, so the distance between images is a measure of their similarity.
Previous Iterations and Continued Development
In previous explorations I used a combination of Python scripts run in a VS Code environment and fed the output data into a Grasshopper script connected to a Rhino model to visualize the data in 3D. Grasshopper and Rhino were great for prototyping the idea but I wanted to be able to visualize the data interactively in my browser which lead me to start exploring ThreeJS. Below are some examples of the various iterations of this project have looked like, with the version 1 being a simple point cloud to the most recent interactive ThreeJS image database visualization tools.
Current Iteration
Here is the most recent iteration of the visualization tool which uses ThreeJS to render images in 3D space. The images are processed on the backend with Python and the output data is written to a JSON file and read by the ThreeJS application. I'm currently working on adding interactive features that display information about an image upon mouseover events and filters that let the user filter images based on categorical tags and other metrics. While this iterations is very much a work in progress, it has significant improvements over previous iterations, particularly with the ability build these tools into a working application that can be used in the browser, rather than requiring special software like Rhino.
This example displays about 1000 photos I’ve taken on various trips or when I lived in different places. Right now, the filter tags are generated by dropping the photos into folders with the location name of where they were taken. Click the video below to see a quick walkthrough of the model.
Dimensionality Reduction
Using color moments as a comparison metric for indexing images is a good start, but to build a more robust and accurate content based image retrieval model, I'll be using singular value decomposition and principal component analysis to reduce the dimensions of images and calculate the similarity of their most important features (principal components.
In singular value decomposition (SVD) and principal component analysis (PCA), the pixel values of images are transformed from 2 dimensional arrays to a single vector. In SVD the values are represented as column vectors and in PCA the values are represented as row vectors.
Once the pixel data for all the images are put into row vectors, we can calculate a covariance matrix for the purpose of identifying which image features capture the most significant variance in the data across all the images analyzed. The covariance matrix gives us eigenvalues and eigenvectors of which indicate the magnitude and direction of the spread of data respectively.
At the time of writing this post I am still working on writing and implementing the remaining steps involved with PCA including eigenvalue decomposition of the covariance matrix, the selection of principal components, projection of principal components and finally, dimensionally reduction. In order to have a fundamental understanding of the mathematics of PCA, I am writing these steps from scratch without the use of external packages which I will post about in the future. To view the code for this project in its current state, please follow the link below to my Github repo.