By: Rachel Heeren
Images of Earth from Space are quite breathtaking, but did you know that these photos can be more than just a pretty sight to look at? Satellite sensors such as the Visible Infrared Imaging Radiometer Suite (VIIRS) sensor are constantly taking millions of pictures of the world around us. Scientists can use the data collected from these images to study land and aquatic properties of our planet. However, in order to gather the data from these images, clouds and reflectances from the Sun must be eliminated from the pictures. These variables that need to be removed can be difficult to differentiate against similar-colored regions containing snow and ice. Furthermore, humans cannot quickly sort through millions of photos or easily pick out clouds from snow and ice. These challenges are some of the reasons why scientists have turned to machine learning to undergo the process of classifying the satellite’s data. For my project, I worked to restore functionality to a Machine Learning Random Forest classifier program as well as to improve the model’s readability and efficiency. Random Forest Classification takes an input of data and sorts it into predefined categories using a machine learning algorithm. Developments in code readability included commenting on the program’s code and creating a formal documentation flowchart (process flowchart) to allow people unfamiliar to the project or programming to gain a better understanding of what the program does. In addition, I used data wrangling, a method to transform, cleanup, and condense the input values, and data analysis to make a once flawed machine learning model (non-executable) with unusable training data successfully run, producing a three-colored classified image where you could easily differentiate the clouds, land, and water in 23 seconds.
This research was conducted as a part of the Partners in Science Research Program. To read more, click here for a full pdf of Rachel's formal research paper about her project!