Data Plus 2021
Creating AI for Artificial Worlds

Frankie Willard, Alexander Kumar, Caroline Tang

See Our GitHub

Abstract


Recent advances in the ability of object detection to accurately identify objects in real-time has facilitated new developments in computer vision, including in autonomous vehicles, facial recognition, and medical imaging. Given the abundance of aerial and satellite imagery available, object detection can be applied to overhead imagery to quickly and automatically map out regions, providing a geographical distribution of specified objects of interest. However, object detection models often require an abundance of diverse, high quality, and manually labeled imagery, which is expensive to filter and label. Thus, we generated synthetic images and their labels using image blending to reduce the costs of data collection and annotation. We employed the state-of-the-art GP-GAN model to create realistic synthetic imagery that integrated objects within many real, unlabeled background images. Our overhead imagery dataset was separated into its different geographical regions such that we performed experiments to evaluate the value of adding synthetic imagery in improving the accuracy and adaptability of object detection across different regions. Our results suggest that adding GP-GAN generated synthetic imagery to our baseline training dataset improves average precision from the baseline and slightly outperforms synthetic imagery created with 3D models of energy infrastructure.

Motivation


Energy Access Planning

Access to electricity is becoming increasingly critical, especially for promoting economic development, social equity, and improving quality of life. Further, it has been shown that electricity access is correlated with improvements in income, education, maternal mortality, and gender equality. Yet, worldwide, 16% of the global population, or approximately 1.2 billion people, still don’t have access to electricity in their homes. This map from the World Bank in 2017 highlights the uneven distribution of energy access, with the majority of those without electricity access concentrated in sub-Saharan Africa and Asia.

Image from: https://www.visualcapitalist.com/mapped-billion-people-without-access-to-electricity/

One of the first steps in improving energy access is acquiring comprehensive data on the existing energy infrastructure in a given region. This includes information on the type, quality, and location so that energy developers and policymakers can then strategically and optimally deploy energy resources. This information is key for helping them make decisions about where to prioritize development, and whether they should use grid extension, micro/minigrid development, or offgrid options to bring electricity access to new communities.

However, this critical information for expanding energy accessibility is often unattainable or low quality. One potential solution to this issue is to automate the process of mapping energy infrastructure in satellite imagery. Using deep learning, we can input satellite imagery into an object detection model and make predictions about the characteristics and contents of the energy structure in the region featured in the image, providing energy experts with the necessary data to expand electricity access.



Object Detection

Image from: ResearchGate

Object detection consists of classification (identifying the correct object) and object localisation (identifying the location of a given object). Our project has a particular emphasis on object detection, as we seek to improve the detection of energy infrastructure in different terrains as a part of expanding energy access data. Object detection models analyze the scenery of photos and generate bounding boxes around each object in the image. In doing so, it classifies each object and assigns a confidence score based on the accuracy of its prediction. The model predicted that each of the green, yellow, orange, and pink boxes in the image on the left would indicate different objects, being a truck, a car, an umbrella, and a person. Based on examples provided to it, the model learns how to predict these boxes and classes. We refer to these labeled images as ground truth as they contain boxes that denote every object's class and the location within the image.



Applying Deep Learning to Overhead Imagery


After training our object detection model, we can apply it to a collection of overhead imagery to locate and classify different energy infrastructure across entire regions. In our experiments, we test our ability to detect wind turbines to maintain consistency with previous experiments. While we could demonstrate energy infrastructure detection for any number of types of electricity infrastructure, wind turbines were chosen due to their relatively homogeneous nature as opposed to different power plants and other energy infrastructure. Additionally, our dataset was limited to the US, as there is a wealth of high resolution overhead imagery available throughout the US. Ultimately, the methods used to improve object detection of energy infrastructure will be expanded to more energy infrastructure and tested on more regions, however, limiting the the infrastructures to wind turbines and using readily available US imagery helps to quickly provide performance benchmarks for our real and synthetic datasets.



Challenges with Object Detection


Problem 1: Lack of labeled data for rare objects

While the potential of object detection seems promising, it presents two main challenges. The first is that properly training the object detection model requires thousands of already labelled images. According to Alexey Bochkovskiy, developer of the highly used and precise YOLOv4 object detection model, it is ideal to have at least 2000 different images for each class to account for the different sizes, shapes, sides, angles, lighting, backgrounds, and other factors that could vary from image to image. Thus, in order to make the object detection model best generalize, the model will require 1000s of training images per energy infrastructure. Because many types of energy infrastructure are rare objects, obtaining and annotating such a large quantity of satellite images featuring these infrastructures manually is expensive in terms of both time and cost.

Problem 2: Domain adaptation

The second challenge we face is that in training an object detection model to detect energy infrastructure in certain regions, our training set and testing set must come from different locations and thus may have differences in geographical background and other environmental factors. Without being properly trained for the test setting, object detection models are not great at generalizing across dissimilar images yet. What this means is that if we train our model on a collection of images from one region, featuring images with similar background geographies, the model will then be able to perform fairly well on other images with those same physical background characteristics. However, if we then try to input images from a different region with different geographic characteristics, the model's performance becomes significantly worse.


Figure: Example of the domain adaptation challenge, for a model trained on images with geographical features of the images in the source domain (forest & grasslands) that underperforms when tested on images with different background in the target domain (desert)
Image from: 2020-2021 Bass Connections team.


Proposed solution: synthetic imagery

Our proposed solution to address these two problems is to introduce synthetic images into our training dataset. The synthetic images supplement the original real satellite imagery dataset to create a larger dataset to train our object detection model, diversifying the geographical background and orientation of energy infrastructure that the model sees. We generate these synthetic images by cropping the energy infrastructure out of satellite images and using a Generative Adversarial Network to blend them into a real image without any energy infrastructure from one of the target geographic domains.


Figure: Synthetic imagery generation process overview

Previous work

For five years, the Duke Energy Data Analytics Lab has worked on developing deep learning models that identify energy infrastructure, with an end goal of generating maps of power systems and their characteristics that can aid policymakers in implementing effective electrification strategies. In 2015-16, researchers created a model that can detect solar photovoltaic arrays with high accuracy [2015-16 Bass Connections Team]. In 2018-19, this model was improved to identify different types of transmission and distribution energy infrastructures, including power lines and transmission towers [2018-19 Bass Connections Team]. Last year's project focused on increasing the adaptability of detection models across different geographies by creating realistic synthetic imagery [2019-20 Bass Connections Team].

In 2020-2021, the Bass Connections project team extended this work, trying to improve the model’s ability to accurately detect rare objects in diverse locations. After collecting satellite imagery from the National Agriculture Imagery Program database and clustering them by region, they experimented with generating synthetic imagery by taking satellite images featuring no energy infrastructure and placing 3D models of the object of interest on top of the image, and capturing a photo that mimicked the appearance of a satellite image [2020-21 Bass Connections Team]. Their paper, Wind Turbine Detection With Synthetic Overhead Imagery, was published in 2021 by IGARSS. In our project, we build upon this progress and try to improve the 2020 Bass Connections team's ability to enhance energy infrastructure detection in new, diverse locations.

Methodology


Below is a description of the experiments we conducted to evaluate if adding synthetic images to an object detection algorithm enhances its performance across geographic domains. After gathering real images and generating synthetic images, we can construct two datasets. The first dataset includes only real imagery, while the second dataset includes both real and synthetic images. We can train an object detection model on the first dataset, test it, and then repeat the process with the second dataset, comparing the results. If the model performs better when trained on a dataset with synthetic imagery, we can conclude that the synthetic imagery aids the model's performance. Given the Bass Connections previous work creating a synthetic dataset using CityEngine, we can also compare our synthetic dataset’s performance against theirs, to determine which synthetic dataset best improves energy infrastructure detection.


Creating Synthetic Imagery


GANs Overview

Generative Adversarial Networks (GANs) are a method of generative modeling. The concept behind GANs is a zero sum game between two Neural Networks- a generator network and a discriminator network. While the generator attempts to create images that are as realistic as possible, the discriminator tries to determine if those images are real or fake. The generator then learns from what the discriminator identifies as fake, helping it create more realistic images. This novel approach to generative modeling has seen a rapid increase in usage in various scientific domains, due to its ability to generate photorealistic images for tasks including data augmentation, creating art, image to image translation, image harmonization, and image super-resolution.

Image from: Akira.AI


Figure: Top displays polar bear being cut and paste from the left image to the right image. Bottom displays polar bear in the left image being blended with the GP GAN into the right image.
GANs for Image Blending

In trying to generate synthetic imagery with real wind turbines in new terrains, our problem presented the need for image harmonization and image blending. This consists of matching the visual appearance/style of the wind turbine and geographical background images when blending them into a single image. Given the GANs state-of-the-art performance in “GP-GAN: Towards Realistic High-Resolution Image Blending” by Huikai Wu, et al., we chose to investigate the potential for GANs to realistically develop our synthetic imagery dataset.



Synthetic Imagery with GP GAN Pipeline





The synthetic images produced by the GP GAN are a quick and cost-effective solution to labeled datasets that are incommensurate with the great training set size required for adequate performance. To produce our images, we simply need a few images with each energy infrastructure (already in training set) and background images. These images do not have to be labeled, and thus are much more abundant and available for easy use. We can then make simple crops and bounding boxes for sampled real wind turbines within minutes. Our automatic image augmenter then randomly samples sizes, locations, and rotations to generate hundreds of source images in seconds. Our generated sources (size = m) can be matched with any and all destinations (size = n) to create as much as m times n output images that are produced by the GP GAN. Each image is blended in approximately 7 seconds, making our data pipeline an incredibly quick and resource efficient solution for data scarcity. For reference, if we generate 10 source images per destination image and download 100 destination images, we can create a dataset that includes 1000 images in less than 2 hours that includes a diversity in wind turbine location, rotation, size, and geographical background. Below are some example synthetic images created from a variety of background images.


Figure: Our synthetic images contain a variety of background images, source turbines, and turbine orientations.


Synthetic Imagery Design Considerations

In designing the synthetic imagery, we must be careful in controlling environmental variables to generate a diverse dataset of images that are close to real images. These design considerations are critical components of our methodology as the closer the synthetic imagery is to the real test imagery, the more the synthetic imagery will improve our performance when adding it to our training set.


Figure: Bounding box size distribution of turbines in real imagery.
Image from: 2020-2021 Bass Connections team.
Location, Rotation, and Size of the Synthetic Turbines

The first step of our image generation pipeline is our image augmenter, which leads us to consider the location, size, and rotation of our synthetic turbines. The previous Bass Connections team modeled the size distribution of the CityEngine turbines after the size distribution of the real turbines. To ensure a controlled experiment, we sampled their location and sizing information to match the real distributions and allow for fair comparison with the CityEngine dataset. While no rotation data was previously stored, we randomized the rotation of every windmill to allow the object detection model to become familiar with wind turbines from various angles and views.



Which Background Images to Use

Additionally, we had to choose which background images to have placed under our synthetic wind turbine models. In corresponding with the Bass Connections CityEngine experiments, we chose to use background imagery close to the real images in our testing set to maximize the similarity of our synthetic imagery with the target data. This methodology is consistent with real scenarios, as we will likely have access to unlabelled imagery or have the ability to collect unlabelled imagery from around the region we wish to test on for use as background images. Given the lack of manual labeling and filtering required as well as our ability to generate many sources to blend with each background image, this background data collection would ideally not be too time consuming. Using the background images close to our testing locations allows us to estimate the potential performance increase that the synthetic data can provide without introducing confounding variables such as a mismatch between the synthetic background image domain and the target domain (makes it difficult to attribute poor performance to the geographical background or synthetic data generation).

Figure: Test image and nearby collected background image.
Image from: 2020-2021 Bass Connections team.

Experimental Setup


Overview

To evaluate the potential of synthetic imagery in improving the performance of object detection, we set up within and cross domain experiments, where a domain is defined as a specific geographic region. The source domain refers to the region that the real training data comes from, while the target domain refers to the region that the object detection model is applied to. These two types of experiments each correspond to a potential real-world situation one might encounter, and help us to evaluate the potential performance of the object detection model in each of these situations.

In the context of energy access planning, the ultimate goal of this project is to utilize object detection in various regions of the world where energy access is extremely limited and information on existing energy infrastructure is not readily available. Thus, the object detection model must be able to generalize well across different images despite labeled real satellite imagery most likely being limited.


Figure: Overall Experiment Setup. In within-domain experiments, the target domain (Northwest) remains the same geographic region as the source domain (Northwest). In cross-domain experiments, the target domain (Northeast) has no labeled real data, so the model is trained on a different source domain (Northwest) and then applied to the target domain. Orange color denotes a source domain, whereas blue color denotes a target domain.
Image from: 2020-2021 Bass Connections team.
Pairwise Experiment Setup Figure: Pairwise experiment setup. The arrow tails point toward the source domain (where real training data comes from), whereas the arrow heads point toward the target domain (where the model will be tested on). Bi-directional arrows indicate each domain serves as the source for testing the other two domains, and in a separate experiment, the same domain serves as the target to be tested using model trained on the other domains.
Image from: 2020-2021 Bass Connections team.

Our within-domain experiments, where the source and target domains are within the same geographic region, will help us to evaluate the potential for synthetic imagery to supplement limited real training data.

However, as mentioned previously, one of the key challenges that object detection presents is its poor performance when applied to data that looks significantly different from the data on which it is trained. Thus, the cross-domain experiments reflect the potential situation where there exists no data at all from the target domain, and thus the object detection model must be trained on data from an entirely different region. For this experiment, the synthetic data that is used will come from the target region, but the real images will come from a source region, different from the target.



Optimizing the Ratio of Real to Synthetic Data

In constructing our experimental datasets, we need to figure out what ratio of real to synthetic data yields the largest gain in performance (if any). Adding too much synthetic data could lead to overfitting to synthetic data and any irregularities within the synthetic data or differences with regular images would be exacerbated such the object detection model may perform worse. However, adding too little synthetic data will have a negligible effect on performance. The 2020-2021 Bass Connections team designed an experiment in which they tested ratios of 1:0, 1:0.5, 1:0.75, 1:1, and 1:2 real to synthetic ratios. After conducting these experiments, they found that 1:0.75 yields the greatest performance as measured by average precision. Therefore, to maintain similarity to the Bass Connections team, we design our experiments using the 1:0.75 ratio. This ratio allows for fair comparison of our synthetic data generation with the Bass Connections team, however, in the future, we would like to experiment with different ratios with our synthetic data generation process to find the optimal ratio.


Design Setup

Having sampled our data and found the optimal real to synthetic ratio, our final datasets for each region is:

  • Baseline: Train on 100 Real Non-Target Images, Test on 100 Target Domain Images
  • Modified: Train on 100 Real Non-Target Images + 75 Syn Target Images, Test on 100 Target Domain Images


Figure: Within-domain experiment example using Northwest as target domain.
Within Domain Experiment (Target = Source)

For each of the domains we selected, we ran the baseline and modified experiments, where all of the data came from the same region. This experiment helps to evaluate the overall ability of synthetic imagery (especially using our GP-GAN technique) to improve the object detection performance.



Figure: Cross-domain experiment example using Northwest as target domain and Eastern Midwest as non-target domain.
Cross Domain Experiment (Target not equal to Source)

For these experiments, the domains for the real source and target images are different, while the synthetic images used in the modified training dataset are from the target region. Thus, with synthetic images more similar to the target region, we hypothesize that the addition of the synthetic images will improve the accuracy of the object detection when the target and source regions are dissimilar in appearance. These experiments will help us to evaluate the potential for synthetic imagery to improve the object detection model’s ability to generalize across different regions despite the limitations of the existing training data.


YOLOv3

YOLOv3 is a popular object detection model used in various computer vision tasks. YOLO stands for You Only Look Once, as the model is only applied to an image once, dividing the image into regions and predicting bounding boxes for each region in the image. It is widely used because of its much faster object detection speed with similar mAP as other well-performing models. This speed is important for our task as ultimately, we hope to automate the mapping of satellite imagery to energy infrastructure, which will require the model to quickly identify infrastructure in large datasets including imagery of entire regions. Our previous Bass Connections team also used YOLOv3, such that we used YOLOv3 to make direct comparisons between the performance of our models and their's without confounding variables.

Ratio Test Experimental Design Figure: Sample output image from Ultralytics YOLOv3 GitHub repository.

Results




Figure 1: In this image, the YOLOv3 model predicted that 4 objects were wind turbines. 2 of those predictions were correct, meaning the precision would be 2/4. There are 3 wind turbines in the image and the model found 2 of these, meaning the recall would be 2/3.
Image from: 2020-2021 Bass Connections team.

Performance Metrics

To understand our results, it's critical that we first understand the metrics that we have chosen to measure performance. The primary metrics we will use is Average Precision, which combines the classification metrics of precision and recall. We will explain the implication of these metrics starting with the images on the left.

  • Precision: Out of the areas that the model classified as a wind turbine, what fraction of these were actually wind turbines. (Positive Predictive Value)
  • Recall: Out of wind turbines present in the images (ground truth), what fraction of these did the model classify as wind turbines (Hit Rate)

Now we plot the values of precision and recall of the model's predicted outputs on a graph, which is known as a precision-recall curve. On the curves to the right, it is evident that that as precision increases, recall decreases, and vice versa. There is hence a tradeoff between precision and recall. However, we would like to have high values for both precision and recall, which means we would like the area under the precision-recall curve to be as high as possible. A metric that quantifies this area is Average Precision (AP), and thus summarizes the precision-recall curve and rewards models with a high precision and recall.

In the machine learning space, small absolute increases in AP denote a significant improvement in model performance.



Figure 2: Precision Recall Curves. We would like the curve to move to right as much as possible




Figure 3: Sample PR curves of 4 runs of the same experiment.

Reducing Variance

Due to variability and stochasticity in the object detection model’s training process, there will be slight variations between the results of each run, as shown on the left image. Each experiment is therefore repeated 4 times to account for this randomness and improve the accuracy of the result. The average AP value is calculated and used to compare results of our baseline model, model with added CityEngine images, and model with added GP GAN images.



Results

The performances of the model with added synthetic images improve significantly in both within-domain and cross-domain settings. Synthetic images are especially helpful in cross-domain settings, which means they can be useful when there is a lack of data or when it is cost-prohibitive to collect data of the target domain.


Figure 4: All values are in average precision (AP).



Figure 5: Sample ground truth images for testing batch from the GP GAN experiment of training on Northeast and testing on Northeast.

Figure 6: Sample predictions for testing batch from the GP GAN experiment of training on Northeast and testing on Northeast. The YOLOv3 model achieves a precision of 17/18 or 0.94 (one of the predicted outputs in the bottom middle left is not a turbine) and recall of 17/19 or 0.89 (misses a wind turbine in the top middle left and bottom middle right)


Results of Each Geographic Domain Respectively

Here we will present a closer look into the results of training with real images from each of the 3 geographic regions respectively. There is a disparity in performance when the model is trained with real images of different geographic domains. In particular, in cross-domain experiments that test on Eastern Midwest, the model performs generally worse than when testing on other regions.


Figure 7: Results of training with real images from the Northeast.




Figure 8: Results of training with real images from the Eastern Midwest.



As shown above, the model performs consistently worse in the cross-domain experience. However, the model has the greatest average improvement in average precision from the addition of the GP GAN in these same cross-domain experiments, improving the overall cross-domain performance by 31% from the baseline. In fact, the effect of the GP GAN is greatly noticed when considering the worst performance of each dataset. The GP GAN's worst performance of 0.638 Average Precision on the Train EM Val NE experiment is much greater than the other models worst performances, providing a sharp increase in performance. Thus, it provides promise for bridging the gap in cross-domain experiments for different geographic regions.

Key Takeaways


The results show that adding the curated GP GAN generated imagery improves the performance of our object detection model in all cases. This is especially the case in cross domain experiments (testing on an unseen region). The performance increase is more limited in the within domain setting, where there the model is testing on a previously seen region and was already generally performing well. Furthermore, our model not only improves upon the baseline, but also the synthetic CityEngine dataset, demonstrating its ability to outperform other methods of synthetic image generation, especially in cross-domain experiments. Given that our method of synthetic image generation is free and quick to produce, it evidently presents a simple and effective method of enhancing object detection model performance on new domains. Furthermore, it can serve to supplement datasets that simply we lack training data, which is often the case when we are trying to obtain information on energy infrastructure. With the aid of our synthetic imagery, this method of identifying and gathering locations of energy infrastructure in a geographic region could bridge the information gaps that energy access planners need when making decisions about electrification.


Figure 9: The GP-GAN outperformed the CityEngine in all experiments.
Figure 10: The GP-GAN greatly improved the performance of the object detection model from the baseline experiment, especially in the cross-domain experiments where performance is often inadequate and data is often needed.

Future Work


  1. Given various time constraints in our project, we would like to attempt more hyperparameter tuning in the image augmentation process to ensure the images come from a realistic sizing distribution and blending processes to improve the realisticness of our generated images and increase blending.

  2. Apply these techniques to detect other types of energy infrastructure. Because high voltage transmission towers are similar structure to wind turbines, we can easily adapt our synthetic image generation process for transmission towers. We could then test this synthetic imagery in the same manner as what we performed for the synthetic imagery of wind turbines, and see if this method extends to this other types of energy infrastructure.


  3. Images from Pixabay: Substation, Transmission Tower, Solar Panels

  4. We would like to investigate the performance of our synthetic dataset on the YOLOv4 object detection model. YOLOv4 has demonstrated that its AP (Average Precision) has increased by 10% since the YOLOv3 model. Thus, to further improve the wind turbine detection in in-domain and cross-domain experiments, further experimentation with our real and synthetic datasets should be performed with the newer, state-of-the-art object detection model. Additionally, cross-validation of our synthetic imagery techniques using different object detection models including YOLOv4 and EfficientDet could strengthen our results.


  5. Figure: Alexey Bochkovskiy, Chien-Yao Wang, and HongYuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.

  6. Investigate few shot learning, where we use small amounts of real images and large amounts of synthetic data to adapt our object detection model to any region that we choose. In building a classifier of energy infrastructure, there may be infrastructures that are rare and thus hard to find in real data. However, with few shot learning, we could generate several synthetic images containing the rare infrastructures in different settings and orientations, improving energy infrastructure detection while limiting the time and cost of finding labeled data for rare infrastructures. In diversifying the infrastructure, location, size, rotation, and amount of infrastructure in our images, the model could generalize well to any potential differences between domains.


  7. Figure: Guo, Y., Codella, N., Karlinsky, L., Smith, J., Simunic, T., & Feris, R. (2019). A New Benchmark for Evaluation of Cross-Domain Few-Shot Learning. ArXiv, abs/1912.07200.

Acknowledgements


We would like to thank Dr. Kyle Bradbury, Dr. Jordan Malof, and Wayne Hu for their help and guidance along the way. We would also like to thank the previous Bass Connections and Data+ teams for their work leading up to this project. Additionally, we would like to thank Dr. Paul Bendich and Dr. Greg Herschlag for their work organizing and hosting the Duke Data Plus talks, and the speakers who shared their wisdom about the field of data science. Thank you to the Duke Data Plus and Duke Energy Initiative that supported this project.

Our Amazing Team

...

Frankie Willard

Data Scientist

...

Alex Kumar

Data Scientist

...

Caroline Tang

Data Scientist

...

Kyle Bradbury

Project Lead

...

Jordan Malof

Project Lead

...

Wayne Hu

Project Manager