Machine Learning for Image Classification

Part Three

On this chapter, we will continue exploring machine learning techniques for Geospatial Analysis by introducing three more algorithims. So far, we've worked with Random Forest Classifier, an ensemble of decision trees that produces an aggregated and highly accurate output. Next, we will classify an image into Builtup and Non-Builtup areas using Decision tree Classifier and Gradient Boosting Classifiers.

The topics covered are:

 1. Decision Trees
 2. Noise Reduction
 3. Gradient Boosting 
In [1]:
# for file management
import os
import logging
from docopt import docopt

# For analysis
import numpy as np
from osgeo import gdal
import pandas as pd
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import cohen_kappa_score
from sklearn import tree
from sklearn.svm import SVC
from scipy import stats
from scipy import ndimage
from scipy.ndimage import filters

# For visualization
% matplotlib inline
from matplotlib import pyplot as plt
import graphviz
import pydotplus
from IPython.display import Image

We will use the 2017 Landsat 8 imagery we've worked with in the previous chapter.

In [2]:
raster_data_path = "data/rf_data/DC_final/dc_ls8_2017_c.tif"
train_data_path = "data/rf_data/train_3/"
test_data_path = "data/rf_data/test_3/"

Read and load the image

In [3]:
# Read raster
raster_dataset = gdal.Open(raster_data_path, gdal.GA_ReadOnly)
# Get Projection system 
geo_transform = raster_dataset.GetGeoTransform()
proj = raster_dataset.GetProjectionRef()

# Read bands as arrays
bands_data = []
for b in range(1, raster_dataset.RasterCount+1):
    band = raster_dataset.GetRasterBand(b)
    bands_data.append(band.ReadAsArray())

#Stack arrays together 
bands_data = np.dstack(bands_data)

# Change NA values to zero
bands_data[np.isnan(bands_data)] = 0

# Get the shape of the image 
rows, cols, n_bands = bands_data.shape
In [4]:
fig, ax = plt.subplots(1,1, figsize =(15,12))
# Apply the Custom high Pass 

# Select False color bands
b4 = bands_data[:,:,3]
b3 = bands_data[:,:,2]
b2 = bands_data[:,:,1]

# Stack them
false_color = np.dstack([b4,b3,b2])

#Plot
plt.subplot(111), plt.imshow(false_color*2), plt.title("Metro DC Area (2017)")

plt.show()

Our next step is reading and loading the training sample data. We'll start with creating the functions for reading a vector points and creating a raster mask from them

In [5]:
# Create mask
def create_mask_from_vector(vector_data_path, cols, rows, geo_transform,
                            projection, target_value=1):
    data_source = gdal.OpenEx(vector_data_path, gdal.OF_VECTOR)

    # Read each layer
    layer = data_source.GetLayer(0)

    # Get the data type
    driver = gdal.GetDriverByName('MEM')  # In memory dataset
    target_ds = driver.Create('', cols, rows, 1, gdal.GDT_UInt16)

    # Set the projection the same as the images
    target_ds.SetGeoTransform(geo_transform)
    target_ds.SetProjection(projection)

    # Rasterize mask
    gdal.RasterizeLayer(target_ds, [1], layer, burn_values=[target_value])
    return target_ds


# Rasterize vector
def vectors_to_raster(file_paths, rows, cols, geo_transform, projection):
    """Rasterize the vectors in the given directory in a single image."""
    labeled_pixels = np.zeros((rows, cols))

    # Create arrays of sample data
    for i, path in enumerate(file_paths):
        label = i+1
        ds = create_mask_from_vector(path, cols, rows, geo_transform,
                                     projection, target_value=label)
        band = ds.GetRasterBand(1)
        labeled_pixels += band.ReadAsArray()
        ds = None
    return labeled_pixels

Use the two function created above to read shapefiles and extract corresponding pixel values from the image.

In [6]:
# Get all the files ending with .shp
train_files = [f for f in os.listdir(train_data_path) if f.endswith('.shp')]

# Get the name of shapefiles and save it as classes
train_classes = [f.split('.')[0] for f in train_files]

#Join directory path with file names
train_shpfile = [os.path.join(train_data_path, f)
              for f in train_files if f.endswith('.shp')]

# Rasterize vectors
labeled_pixels = vectors_to_raster(train_shpfile, rows, cols, geo_transform, proj)

# Return non-zero values
is_train = np.nonzero(labeled_pixels)

#Get label and sample data
training_labels = labeled_pixels[is_train]
training_original_samples = bands_data[is_train]

let's take a look at our training data.

In [7]:
label_vals=[]
sample_vals=[]
for i in np.unique(training_labels):
    label_tot = sum(training_labels==i)
    sample_tot = (sum(training_labels==i))*6
    label_vals.append(label_tot)
    sample_vals.append(sample_tot)

stats_all= [sample_vals, label_vals]


stat_summary = pd.DataFrame(data =stats_all,
            index = ['Train samples','Label samples'],
            columns = ['Builtup','Non-Builtup'])


print "Total number of Training samples: %s points" %training_labels.size
print "Total number of Label samples %s" %training_original_samples.size
print " Across the three classes: "
print ""
print stat_summary
Total number of Training samples: 783 points
Total number of Label samples 4698
 Across the three classes:

               Builtup  Non-Builtup
Train samples     2346         2352
Label samples      391          392

Great! Now that we've successfully created the training sample data, we will go ahead and examine how Decision Trees based classifiers work.

Decision Trees

Previously, we used a decsion tree classifier that combines the aggregates/averages the estimates of weak learners to classify two images and detect the change in land cover. On this section, we will introduce another decsion tree based classfier known as Gradient Tree Boosting, to classify an image and detect urban/builtup area. Before doing so, let's take a look at how Decision Trees are generated. </br>

Decison Trees encompass all non-parametric supervised learning methods that create the final estimate the by learning simple decision rules inferred from the data features. In the case Random Forests, trees are aggregated to generate the final output. Gradient Tree Boosting, in contrast, the final output is created by sequentially estimating from weak learners. Under the hood, however, they both generate trees. To help us how these is done, we will visualize the leafs and nodes using the Decision Trees Classifier.

In [8]:
# the Names of six bands of our mage
bands = ["Blue", "Green", "Red","NIR", "SWIR1", "SWIR2"]

# Build Decision Trees
E_clf_trees = tree.DecisionTreeClassifier(max_depth =10, min_samples_leaf=10)

# Fit the classifier on the Enhanced Image 
E_clf_trees = E_clf_trees.fit(training_original_samples, training_labels)

E_clf_trees
Out[8]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=10, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')
In [9]:
# Export the tree as agraph
dot = tree.export_graphviz(E_clf_trees, out_file=None,
                           feature_names=bands,
                           class_names=train_classes,
                           filled=True, rounded=True,
                           special_characters=True)

# Plot the trees
graph = pydotplus.graph_from_dot_data(dot)
# Create png
Image(graph.create_png())
Out[9]:

At each node, the surface reflectance of a pixel at randomly selected band is evaluated to determine whether the pixel can be classified as Builtup or Non-Builtup. This process will continue until the Gini Impurity or the (the chance of incorrectly labeling the pixel) is 0.

Let's reduce the value of the max_depth parameter to 2 and see the trees closely.

In [10]:
E_clf_trees = tree.DecisionTreeClassifier(max_depth =2,min_samples_leaf=10)

# Fit the classifier on the Enhanced Image 
E_clf_trees = E_clf_trees.fit(training_original_samples, training_labels)
# Export the tree as agraph
dot = tree.export_graphviz(E_clf_trees, out_file=None,
                           feature_names=bands,
                           class_names=train_classes,
                           filled=True, rounded=True,
                           special_characters=True)


graph = pydotplus.graph_from_dot_data(dot)

Image(graph.create_png())
Out[10]:

In this case, for each the two classes, new nodes will be created as long as the Gini Impurity is greater than 0. Tunning the parameters of classifiers can help us refine the estimates of individual weak learners as well as their combined output.

In the next section, we will look into a technique to enhance our satellite image by reducing the noise in pixel values. In classifying the 2001 image, we noticed that in some parts of our study area, soil was classified as builtup surface because the surface reflectance of certain pixels was similar to the builtup surface. As we shall see here, one solution is to enhance the quality of the image through noise reduction.

Noise Reduction

In chapter 2, we looked at various Filters (Kernels) that can be applied to sharpen or smooth out images. In similar way, we can apply a Filter and enhance the quality of the image we shall classify later. The Filter we will apply is called the Median Filter and as the name implies, the filter assigns the median value of pixels. To illustrate how these work, let's take a look at the following table of a 3x3 matrix.

                        3x3 matrix window

                        -------------------
                        | 130 | 124 | 125 |
                        -------------------
                        | 145 | 135 | 120 |
                        -------------------
                        | 120 | 132 | 110 |
                        -------------------

The Filter first, places pixel values in ascending order. </br> In this case: 110, 120, 120, 124, 125, 130, 132, 135, 145.

Then, the pixel at the very center with the digital number 135 will be replaced by the median pixel value 125.

Let's run the filter and compare the original image with the enhanced one.

In [11]:
fig, ax = plt.subplots(1,2, figsize =(18,15))
# Apply the Custom high Pass 

# Select False color bands
b4 = bands_data[:,:,3]
b3 = bands_data[:,:,2]
b2 = bands_data[:,:,1]

# Stack them
false_color = np.dstack([b4,b3,b2])

# Reduce noise with a Median filter
median_img=ndimage.median_filter(bands_data, size=(3))

# Select False color bands
E_b4 = median_img[:,:,3]
E_b3 = median_img[:,:,2]
E_b2 = median_img[:,:,1]

# Enhanced image
E_false_color = np.dstack([E_b4,E_b3,E_b2])

#Plot
plt.subplot(121), plt.imshow(false_color*2), plt.title("Original Image")
plt.subplot(122), plt.imshow(E_false_color*2), plt.title("Enhanced Image")

plt.show()

The Enhanced image seems darker but the land covers are more distinguishable than the original image. To Better understand how this process can help our ability to accurate builtup surface area, we will use Gradient Tree Boosting for Classification and seprate the builtup surface from the non-builtup areas.

Gradient Tree Boosting

Gradient Tree Boosting(GTB) is well known ensemble model that works by combining the predictions of several in order to improve the robustness of a model. GTB differs from Random Forests in that the later aggregates (averages) the outputs of weak learners while GTB builds sequentially up on each tree to reduce the bias of the combined estimator.

Although GTB has been found to be highly accurate, its also suspitable to overfitting. This can, however, be minmized by tunning parameters and introducing hyper-parameters that could limit the combined learner from overgeneralizing.
Some of the most commonly tuned parameters are:

1) n_estimators = The number trees in the forest from which the learner combines estimate
2) learning_rate = The contribution of the weak learners in the final combination
3) min_samples_leaf = The minimum required number of samples at a leaf
4) max_features = The size of the random subsets of features to consider when splitting a node
5) max_depth = The maximum number of leaves in the forest

We will tune our model with custom parameters obtained through trail and error.

In [12]:
gbc_clf = GradientBoostingClassifier(n_estimators=10,
                                     learning_rate=1,
                                     max_depth=1,
                                     max_features = 0.5,
                                     random_state=42)

Now, let's run GB-Classifer on the Original Image

In [13]:
# Run the classifier on samples from the original image
O_gbc_clf = gbc_clf.fit(training_original_samples, training_labels)
O_gbc_clf
Out[13]:
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=1, loss='deviance', max_depth=1,
              max_features=0.5, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=10,
              presort='auto', random_state=42, subsample=1.0, verbose=0,
              warm_start=False)

Next, we will fit the GB-Classifer on the Enhanced Image. But, first, let's get the training sample data for the enhanced image.

In [14]:
# Training data for the image with reduced noise
training_enhanced_samples = median_img[is_train]

# Run the classifier samples from the enhanced image
E_gbc_clf = gbc_clf.fit(training_enhanced_samples, training_labels)
E_gbc_clf
Out[14]:
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=1, loss='deviance', max_depth=1,
              max_features=0.5, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=10,
              presort='auto', random_state=42, subsample=1.0, verbose=0,
              warm_start=False)

Predict estimates for both images

In [15]:
# Grab the dimension
n_samples = rows*cols

# Reshape images dimension
O_flat_pixels = bands_data.reshape((n_samples, n_bands))
E_flat_pixels = median_img.reshape((n_samples, n_bands))

# Predict
O_result = O_gbc_clf.predict(O_flat_pixels)
E_result = E_gbc_clf.predict(E_flat_pixels)


# Get the rows and columns
clf_Original = O_result.reshape((rows, cols))
clf_Enhanced = E_result.reshape((rows, cols))

visualize the outputs for both images

In [16]:
fig, ax = plt.subplots(1,2, figsize =(20,16))

plt.subplot(121), plt.imshow(clf_Original, cmap ='terrain'),plt.title("Orignial Image")
plt.subplot(122), plt.imshow(clf_Enhanced, cmap ='terrain'),plt.title("Enhanced Image")
plt.show()

Two interesting results:

1) By tunining the parameters, we managed to prevent the classifier from overfitting which forced the model to underpredicts on the orignial image

2) Under the same constraints, the model trained on the Enhanced image, nonetheless, performed well. Hence, enhancing the image significantly improved the quality of the prediction despite the severe penalization imposed on the classifier. Later, We will compare its accuracy with two other classifiers.

Let's evaluate the difference between the Original and Enhanced Images. Since we are interested in builtup area, let's focus on how the filter enhanced this land cover type.

In [17]:
O_pixels = np.unique(clf_Original)
O_img = []
for group in O_pixels:
    O_img.append(len(clf_Original[clf_Original == group]))

E_pixels = np.unique(clf_Enhanced)
E_img = []
for group in E_pixels:
    E_img.append(len(clf_Enhanced[clf_Enhanced == group]))


print " Builtup pixels on the Origninal Image = %r" %O_img[0]
print " Builtup pixels on the Enhanced  Image = %r" %E_img[0]
print " Effect of Enhacement (Difference) = %r" %(O_img[0]-E_img[0])
 Builtup pixels on the Origninal Image = 677086
 Builtup pixels on the Enhanced  Image = 1646292
 Effect of Enhacement (Difference) = -969206

In heavily regularized environment, our model detected the builtup up surface from the enhanced image with great precision. Our next step is to examine these predictions through accuracy assessment.

In [18]:
test_files  = [f for f in os.listdir(test_data_path) if f.endswith('.shp')]


# Get the name of shapefiles and save it as classes
test_classes = [f.split('.')[0] for f in test_files]


# Get the test shapefile data
test_shpfile = [os.path.join(test_data_path, "%s.shp" % c)
              for c in test_classes]

# Rasterize shapefiles
test_pixels = vectors_to_raster(test_shpfile, rows, cols,
                                        geo_transform, proj)

#Create non-zero images
test_image = np.nonzero(test_pixels)


#Get test outputs and labels
testing_labels = test_pixels[test_image]
Original_image = clf_Original[test_image]
Enhanced_image = clf_Enhanced[test_image]
In [19]:
from sklearn.metrics import cohen_kappa_score
target_names = ['Class %s' % s for s in test_classes]
print("Classification report:\n%s" %
      metrics.classification_report(testing_labels, Original_image,
                                    target_names=target_names))
print("Overall accuracy: %f" %
      metrics.accuracy_score(testing_labels, Original_image))

print ("Kappa Statistics: %f " %
       cohen_kappa_score(testing_labels, Original_image))
Classification report:
                        precision    recall  f1-score   support

    Class Builtup_test       0.98      0.72      0.83       131
Class Non_Builtup_test       0.80      0.99      0.88       147

           avg / total       0.88      0.86      0.86       278

Overall accuracy: 0.859712
Kappa Statistics: 0.714338

Accuracy assessment for the Enhanced Image

In [20]:
from sklearn.metrics import cohen_kappa_score
target_names = ['Class %s' % s for s in test_classes]
print("Classification report:\n%s" %
      metrics.classification_report(testing_labels, Enhanced_image,
                                    target_names=target_names))
print("Overall accuracy: %f" %
      metrics.accuracy_score(testing_labels, Enhanced_image))

print ("Kappa Statistics: %f " %
       cohen_kappa_score(testing_labels, Enhanced_image))
Classification report:
                        precision    recall  f1-score   support

    Class Builtup_test       0.95      0.94      0.95       131
Class Non_Builtup_test       0.95      0.96      0.95       147

           avg / total       0.95      0.95      0.95       278

Overall accuracy: 0.949640
Kappa Statistics: 0.898862

Let's visualize the change un Builtup area

According to our assessment, the model performed signficantly better on the image we enhanced using the Median Filter. The overall accuracy increased by almost 10% while the aggreement between the User's and Producers Accuracy showed 18% improvment.

Summary

On this chapter, we looked at how decision trees are generated and assembled for the final estimate. We also looked at an image enhancement technique that reduces the noise of images by replacing pixels with median values. Finally, we compared and contrasted th estimates of Gradient Tree Boosting Classifier trained on the original and enhanced version of the same image.

The end!