Image Enhancement and Dimension Reduction ¶

In digital image analysis, a wide variety of image enhancement techniques are used to enhance the visibility, reduce the size, evaluate spectral values or detect structural patterns of satellite images. On this section, we will cover the first two:

     1. Image enhancement
          - Sobel, Laplace, Low Pass, and High Pass Filters
     2. Principal Component Analysis
          - Eigen Values and Eigen Vectors

#  Load libraries

from osgeo import gdal
import numpy as np

# For Visualization
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from rasterio.plot import show


# For Summary and Analysis
from scipy import stats
from scipy.cluster.vq import *
import statsmodels.api as sm
import pandas as pd
from scipy import ndimage, misc
from sklearn.decomposition import PCA
from sklearn.preprocessing import *
from scipy.ndimage import filters
from scipy.ndimage import gaussian_filter

The Image¶

We will continue to work with the same satellite image

image_data = gdal.Open(r"data/2001_dc_sub.tif")

Let's also go ahead and read the entire image as array and subset each of the six bands.

# Read image as Array
image_data_arr = image_data.ReadAsArray()

band_1 = image_data.GetRasterBand(1).ReadAsArray()
band_2 = image_data.GetRasterBand(2).ReadAsArray()
band_3 = image_data.GetRasterBand(3).ReadAsArray()
band_4 = image_data.GetRasterBand(4).ReadAsArray()
band_5 = image_data.GetRasterBand(5).ReadAsArray()
band_6 = image_data.GetRasterBand(6).ReadAsArray()

Image Enhancement¶

Image enhancment is a technique of modifying pixel values by applying filters to better understand patterns. Let's take a look at some of the most commonly used filters.

Gaussian Filter is commonly known as a smoothening or blurring filter. As its name implies, its blurs or smoothes out the image by removing noise.

The Sobel Filter also known as the Sobel–Feldman operator, is a widely used technique used for sharpenining and detecting edges.

The Laplace Filter is another kernel used for edge detection best known for highlighting regions of an image with rapid intensity change.

High Pass Filter is another, edge detecting filter and it makes an image appear sharper.

To examine how these filters work, we will first apply the smoothening filter on band 4. Then, we will apply four filters on the band we remove noise from and observe how each detect features. For a closer up look, let's zoom-in to the DC area.

# Set figure size
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(2,3, figsize=(20,16))

# Smoothen band 4
band_4_s = ndimage.gaussian_filter(band_4, sigma=2)

# zoom-in plot band 4
show(band_4[1150:1575, 850:1200],
     cmap=plt.cm.pink, ax = ax1, title = "Band 4")

# Apply Gaussian (Smoothening) Filter
show(band_4_s[1150:1575, 850:1200],
     cmap=plt.cm.pink, ax = ax2, title = "Smoothening Filter")

# custom Mean Kernels = Low Pass 3x3
mean_kernel = np.array([[1, 1,  1],
                       [1,  1,  1],
                       [1,  1,  1]])

# Apply the Custom high Pass 
show(ndimage.convolve(band_4_s, mean_kernel)[1150:1575, 850:1200],
     ax = ax3, cmap=plt.cm.pink, title = "Low Pass")


# Apply Soble Filter
show(ndimage.sobel(band_4_s, mode='nearest')[1150:1575, 850:1200],
     cmap=plt.cm.pink, ax = ax4, title = "Sobel Filter")

# Apply Laplace Filter
show(ndimage.gaussian_laplace(band_4_s, sigma=1)[1150:1575, 850:1200],
     ax=ax5, cmap=plt.cm.pink, title ="Laplace Filter")

# Create custom Kernels = High Pass 3x3
high_kernel = np.array([[-1, -1, -1],
                        [-1,  8, -1],
                        [-1, -1, -1]])


# Apply the Custom high Pass 
show(ndimage.convolve(band_4_s, high_kernel)[1150:1575, 850:1200],
     ax = ax6, cmap=plt.cm.pink, title = "High Pass")


plt.show()

The five filters applied to the the 4th band depict different patterns. When needed, we can also create custom kernels as we did for the Low and High pass filters. Next, we look into dimension reduction through component analysis.

Principal Component Analysis¶

Principal component analysis (PCA) is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components.

The first principal component contains most of the variability of pixel values on the image. As we go to subsequent components, by and large, the share of variance explained by each declines.

To calculate the PCA, we will begin by reshaping the dimension of image and find the Eigenvalues and Eigenvectors.

# Reshaping dimension

# get the format of the orignial data
n_samples, nx, ny = image_data_arr.shape

# reshape the three dimension to two
image_data_arr_2_dim= image_data_arr.reshape(n_samples, nx*ny)

Eigenvalues and Eigenvectors¶

Eigenvalue represent the variance explained by each of the components. Eigenvector, on the other hand, is a vector whose direction remains unchanged when a linear transformation is applied to it. These two values will help us better understand the relationship in our bands and the number components to which we will reduce the image to.

In the lines of codes below, we will start by determining the total number of components we want to generate and scale (normalize) the pixel values. Scaling the pixel values standardizes the bands for a comparable estimate.

# Determine the number of components
pca = PCA(n_components =6)

# Scale the 2 dimensional data
image_data_arr_2_dim_scaled = MinMaxScaler().fit_transform(image_data_arr_2_dim)

# run the pca
pca.fit(image_data_arr_2_dim_scaled)

PCA(copy=True, iterated_power='auto', n_components=6, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

Next, we will calculate the covariance, eigenvalues, eigenvectors and the variance-ratio

# Covariance
cov_mat = np.cov(image_data_arr_2_dim_scaled)

# Get the egienvalue and eigenvector 
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)

# Variance Ratio
variance_Ratio = pca.explained_variance_ratio_

# Variance ratio in percentage 
percentage =[]
for i in variance_Ratio:
    percent = (i *100).round(3)
    percentage.append(percent)

Create a dataframe for these values

variance_vals = [eigen_values, variance_Ratio, percentage]
varSummary_Df = pd.DataFrame(data = variance_vals,
            index = ['Eigen Values','Variance Ratio', 'Variance Explained (%)'],
            columns = ["Component 1","Component 2","Component 3",
                    "Component 4","Component 5","Component 6"])

# Transpose to re-arrange
varSummary_Df.round(3).transpose()

As expected, the first PCA explained the highest share of variance on the image. By using the first 3 components, hence, we can reduce the six band image to 3 while retaining 99% of the most valueable information.

fig, (ax1, ax2,) = plt.subplots(1,2, figsize=(15,6))
plt.subplot(121),plt.plot(np.cumsum(eigen_values), c ="blue")
plt.xlabel('number of components')
plt.ylabel('cumulative eigenvalues');
plt.subplot(122),plt.plot(np.cumsum(pca.explained_variance_ratio_), c="red")
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');

As it was pointed out earlier, Eigen Vectors (Factor Scores) computed for the covariance metrix explains loading factors or how much a component explains a band. Hence, we can identify how each band is represented loaded on the component. Factor loadings range from -1 to 1.

eigenVectors_Df = pd.DataFrame(data = eigen_vectors,
                    columns = ['Component 1', 'Component 2','Component 3',
                            'Component 4','Component 5','Component 6'],
                        index = ['Band 1', 'Band 2','Band 3',
                                 'Band 4','Band 5','Band 6'])

eigenVectors_Df.round(3)

The first three bands has a postive correlation with the first component. On the other hand, the Loading factors of the NIR band on the first component was negative. Band 5 adds little to no information on the first component but its loading factor changes sharply on the second and third. Band 6, has the least amount of information.

Component Extraction¶

Now that we've completed our assessment, we'll move on to fitting the PCA and transforming the dimension of the data to for extraction.

# Fit the pca to the scaled data
pca_scaled_img = pca.fit_transform(image_data_arr_2_dim_scaled)

# Inverse Components to Orignial  2 space
image_inversed = pca.inverse_transform(pca_scaled_img)

image_scaled = MinMaxScaler(feature_range=(0, 1), copy=True).fit_transform(image_inversed)

# PCA Reshape back to the 3 D comp
Components = image_scaled.reshape((n_samples,nx,ny))

# Check the dimensions
Components.shape, image_inversed.shape

((6, 2171, 1971), (6, 4279041))

Finally, we can extract each component.

PCA_1 =Components[0]
PCA_2 =Components[1]
PCA_3 =Components[2]
PCA_4 =Components[3]
PCA_5 =Components[4]
PCA_6 =Components[5]

Let's visualize them.

fig, ((ax1, ax2,ax3), (ax4, ax5, ax6))= plt.subplots(2,3, figsize=(20,16))

show((PCA_1), ax=ax1, cmap='gray', title='Component 1')
show((PCA_2), ax=ax2, cmap='gray', title='Component 2')
show((PCA_3), ax=ax3, cmap='gray', title='Component 3')
show((PCA_4), ax=ax4, cmap='gray', title='Component 4')
show((PCA_5), ax=ax5, cmap='gray', title='Component 5')
show((PCA_6), ax=ax6, cmap='gray', title='Component 6')

plt.show()

The first three components show 99% of the infrormation on the image. Hence, we can simply extract these three and export them as an image.

Exporting Components¶

First, stack the the first three components as in a similar way an image with RGB composite is stacked.

pc_img = (np.dstack([PCA_1,PCA_2, PCA_3]))

Let's plot the stacked components and make sure that we have the a correct image.

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(15,7))
ax1.imshow(pc_img, cmap ="RdYlBu")
ax1.add_patch(patches.Rectangle(
    (850, 1200),
        300,
        300,
        fill=False,
        edgecolor="Black"
    ),
)
ax2.imshow(pc_img,cmap ="gist_earth")
plt.xlim(850,1150)
plt.ylim(1575,1200)
plt.show()

In order to successfully export the components, we would need to first specify few parameters.

# Set Dimensions
rows = image_data.RasterXSize
cols = image_data.RasterYSize
bands = 3

# Set the output path
output_fname = r'data/PCA.tiff'

# Get the projection 
# and transfromation of original image
geo_transform=image_data.GetGeoTransform()
proj=image_data.GetProjection()

# Define a function to write array to raster
# parameters: file name, the arraay, transformation projection, data type
def arrayRaster(fname, data, geo_transform, projection, data_type=gdal.GDT_Byte):
    # Get GeoTIFF driver
    driver = gdal.GetDriverByName('GTiff')
    #pass parameters to the data
    dataset = driver.Create(fname, rows, cols, bands, gdal.GDT_Float32)
    # set transformation and projection
    dataset.SetGeoTransform(geo_transform)
    dataset.SetProjection(projection)

    # write each array to each band
    band = dataset.GetRasterBand(1).WriteArray(PCA_1)
    band = dataset.GetRasterBand(2).WriteArray(PCA_2)
    band = dataset.GetRasterBand(3).WriteArray(PCA_3)

# Create the Raster
arrayRaster(output_fname, pc_img, geo_transform, proj)

We're done!

Summary¶

On this tutorial, we covered various ways of image enhancement kernels as well as a technique for dimension reduction. We saw how filters can be used to remove noises and detect edges. We calculated for Eigenvalues and Eigenfactors, assessed factor loadings and evaluated the correlation between bands and components. </br>

In the next tutorial, we will explore Spectral Indices and Spectal Transformation!

	Eigen Values	Variance Ratio	Variance Explained (%)
Component 1	0.171	0.762	76.157
Component 2	0.064	0.147	14.658
Component 3	0.041	0.083	8.272
Component 4	0.011	0.007	0.740
Component 5	0.001	0.002	0.172
Component 6	0.004	0.000	0.000

	Component 1	Component 2	Component 3	Component 4	Component 5	Component 6
Band 1	0.372	0.544	0.412	-0.455	0.298	0.316
Band 2	0.282	0.355	0.151	0.208	-0.852	-0.037
Band 3	0.491	0.128	0.104	0.637	0.408	-0.399
Band 4	-0.728	0.490	0.315	0.347	0.104	0.004
Band 5	-0.017	-0.540	0.835	-0.005	-0.082	-0.070
Band 6	0.106	-0.172	-0.031	0.472	0.036	0.857