Pixel Driller#

Introduction#

Pixel Driller is for those who:

  • want to extract pixels from locations in an image

  • calculate statistics on the extracted pixels

  • are comfortable using Python

For example, you might be someone who:

  • works in the field of remote sensing

  • takes measurements of something on the ground, and

  • creates statistical or machine learning models to predict that something from satellite or aerial images

What it does#

Pixel Driller extracts pixels from images acquired over your field sites and runs statistics on them. You would use those statistics as predictors in your machine learning models.

It handles the following use cases:

  • Given a STAC endpoint, set of X-Y-Time points, and a buffer, return the n nearest-in-time zonal stats for all bands of the specified raster assets about each point

  • Given a list of images, a set of X-Y points, and a buffer, return the zonal stats for the bands in each image about each point

Features#

  • Extract from images by specifying a file path or URL

  • Extract from images by searching a STAC catalogue

  • Extract from STAC images up to n days either side of the field survey date

  • Specify the size and shape of the extraction area for each field site

  • Use in-built functions for calculating statistics or define your own

Example#

pixdrill.drill.drill() is the main interface. However, several data structures and functions must be created before using it. The following example shows a typical usage pattern:

import datetime
from pixdrill import drill
from pixdrill import drillpoints
from pixdrill import drillstats

# Step 1. Specify the field survey properties.
# Specify the locations of your field surveys.
# For STAC Items, you also specify the image acquisition period
# either side of the field survey date. Pixels are extracted
# from images that contain the point, and within the acquisition period
# for STAC Items.
points = []
longitude = 140
latitude = -36.5
crs_code = 4326 # EPSG code for WGS84 coordinate reference system. See epsg.org.
date = datetime.datetime(2022, 7, 28)  # 28 July 2022
t_delta = datetime.timedelta(days=3)   # 3 days either side
site_shape = drill_points.ROI_SHP_CIRCLE  # Define the extraction shape
site_radius = 50  # In metres
pt_1 = drillpoints.Point(
    longitude, latitude, date, crs_code, t_delta, site_radius, site_shape)
points.append(pt_1)
# Add additional points for each field site
...

# Step 2. Choose the images and STAC Items to be drilled.
# Firstly, the image paths.
img1 = "/path/to/img1.tif"
img2 = "/path/to/img2.tif"
# Secondly, the STAC catalogue and raster assets of the STAC Items.
endpoint = "https://earth-search.aws.element84.com/v0"
collections = ['sentinel-s2-l2a-cogs']
assets = ['B02', 'B11']

# Step 3. Define statistics.
# Select from standard, in-built statistics or define your own.
# They are run on the pixels extracted from the images for all your
# field sites. See the tutorial for how to specify and define the
# stats functions.
std_stats = [drillstats.STATS_MEAN]
user_stats = [('MY_RANGE', my_range_func)]

# Step 4. Extract pixels and compute stats.
# Drill the given images and those returned from a search of
# the STAC catalogue. Calculate the stats at the same time.
drill.drill(
    points, images=[img1, img2],
    stac_endpoint=endpoint, raster_assets=assets,
    collections=collections,
    std_stats=std_stats, user_stats=user_stats)

# Step 5. Fetch results.
# Retrieve the results using each Point's 'stats' attribute (a dictionary).
for pt in points:
    print(f"Stats for point: x={pt.x}, y={pt.y}")
    for item_id, item_stats in pt.stats.get_stats().items():
        print(f"    Item ID={item_id}")
        print(f"        Mean values: {item_stats[drillstats.STATS_MEAN]}")
        print(f"        Ranges     : {item_stats['MY_RANGE']}")

The example creates the following output for the first point. Note that two Items in the STAC catalogue were found 3 days either side of the field-survey’s date of 28 July 2022. Data were extracted for the ‘B02’ and ‘B11’ assets of these items, thus Mean values and Ranges contain two values. Contrast that with the results for ‘img1.tif’ and ‘img2.tif’, which are single-band rasters.:

Stats for point: x=140, y=-36.5:
    Item ID=S2A_54HVE_20220730_0_L2A
        Mean values: [3257.65289256 2369.75]
        Ranges     : [164, 37]
    Item ID=S2B_54HVE_20220725_0_L2A
        Mean values: [3945.52066116 3198.11111111]
        Ranges     : [1064, 779]
    Item ID=/path/to/img1.tif
        Mean values: [60.]
        Ranges     : [0]
    Item ID=/path/to/img2.tif
        Mean values: [1782.]
        Ranges     : [0]

Download#

Releases and Source code from Github.

Installation#

The package requires pystac-client, numpy and GDAL.

All packages are available from the conda-forge archive.

They may also be available from your platform’s package manager.

Alternatively the GDAL python bindings are available on pypi. But libgdal must be installed first. You might use your platform’s package manager to install GDAL and its bindings instead. For example, on Ubuntu:

> sudo apt-get install -y python3-gdal gdal-bin`

pystac-client and numpy are also available from pypi, and are installed when installing Pixel Driller:

> pip install https://github.com/cibolabs/pixeldriller/archive/refs/tags/v0.2.0.tar.gz

Guides#

Indices and tables#