Developer Guide#
This guide provides developers with an overview of Pixel Driller’s concepts and design.
It should be read in conjunction with the Tutorial.
Package design#
pixdrill is the top-level package. The Python modules and classes
in pixdrill are shown in the following class diagram as dark grey boxes.
The package leans on pystac-client,
pystac, numpy and
GDAL.
The main classes in these packages that Pixel Driller interfaces with
are shown in the diagram as white boxes.
![digraph Structure {
splines="TRUE";
/* Entities */
/* pixdrill classes */
node [shape="square" style="filled" fillcolor="grey"]
ImageItem [label="Image Item\n(drill.ImageItem)"]
ImageReader [label="Image Reader\n(image_reader.ImageReader)"]
ImageInfo [label="Metadata\n(image_reader.ImageInfo)"]
Point [label="Survey Point\n(drillpoints.Point)"]
PointStats [label="Survey Stats\n(drillstats.PointStats)"]
Driller [label="Driller\n(drillpoints.ItemDriller)"]
ArrayInfo [label="Drilled Data\n(image_reader.ArrayInfo)"]
/* Class from other packages */
Item [label="Item\n(conceptual only)" fillcolor="#eeeeee"]
node [shape="square", style=""]
StacItem [label="STAC Item\n(pystac.Item)"]
StacAsset [label="STAC Asset\n(pystac.Asset)"]
StacCatalogue [label="STAC Catalogue\n(pystac-client.Client)"]
Image [label="Image\n(gdal.Dataset)"]
Band [label="Band\n(gdal.Band)"]
PixelData [label="Pixels\n(numpy.ma.masked_array)"]
/* Inheritance Relationships */
edge [arrowhead="o"]
/* Associations */
edge [arrowhead="vee"]
StacItem -> StacAsset[headlabel="*"]
StacAsset -> Image
ImageItem -> Image
PointStats -> Item[headlabel="*"]
Driller -> Item
StacCatalogue -> StacItem[headlabel="*"]
Point -> PointStats
PointStats -> ArrayInfo[headlabel="*"]
Driller -> Point[headlabel="*"]
ArrayInfo -> PixelData
/* Compositions */
edge [arrowhead="diamond"]
ImageReader -> ImageInfo
Image -> Band[headlabel="*"]
/* Dependencies */
edge [arrowhead="vee" style="dashed"]
Driller -> ImageReader[headlabel="*"]
ImageReader -> PointStats[label="populates data"]
/* Realisations */
edge [arrowhead="normal" style="dashed"]
StacItem -> Item
ImageItem -> Item
/* Notes */
/*(ItemNote [label="Add comment here", shape="note"]
edge [arrowhead="odiamond"]
ItemNote -> Item*/
/* Ranks ?? */
/*{ rank=same; StacItem; ImageItem}*/
/*{ rank=same; Image};*/
}](_images/graphviz-cb8e24dcafc816355f591b12b05612b31b6028d8.png)
Images and Items#
At its core, Pixel Driller is about extracting pixels from images.
The user specifies these images in one of two ways by providing:
a path or URL to an image, represented by a
pixdrill.drill.ImageItemparameters for searching for images in a STAC catalogue, the results of which are represented by
pystac.Itemobjects
In the diagram, Item is a conceptual abstraction of
ImageItem and pystac.Item.
It has no corresponding software component. All Items have an
id attribute and a way to access the file paths or URLs to the
underlying image or images.
The main difference between an ImageItem and a
pystac.Item is the number
of images associated with them. An ImageItem
has only one image.
A pystac.Item can have one or more images; each
a pystac.Asset.
Pixel Driller delegates image reading to GDAL.
GDAL represents an image as a Dataset.
An image is composed of one or more bands. Each band is a two-dimensional
numpy array of pixels.
Where and what to drill#
A Survey Point determines the location
of the pixels within an image to extract.
For Image Items, the user supplies the
paths or URLs to the images they want drilled.
The Survey Point's date is not used.
For STAC Items, Pixel Driller uses the images
returned from searching a STAC Catalogue with the user-supplied
parameters, including:
the
STAC Catalogueand collections specifiedthe locations of the Survey
Pointsthe image acquisition-window for each Survey
<pixdrill.drillpoints.Point>
Drilling#
The pixdrill.drill module contains functions for drilling or creating
the Driller
objects to do so.
A Driller contains:
the
Itemto be drilleda collection of Survey
Pointsthat intersect theItema function for reading the pixel data for every
Pointfrom theItem'simagesa function for calculating the statistics for every
Point
Driller
delegates the responsibility of reading the pixels from an
image to an Image Reader object.
It also delegates responsibility for computing statistics to each
Point's
Stats object.
Thus Drillers indirectly populate
each Point's survey statistics.
Statistics#
Each Point stores its pixel data and
statistics in a pixdrill.drillstats.PointStats object.
A Point might intersect multiple Items, so the
PointStats object stores the pixel data and
statistics for every Item.
Pixel data and statistics are stored in the PointStats.item_stats
dictionary, keyed by the Item's ID. Each item in the dictionary is
another dictionary containing elements for:
the raw pixel data, a masked array
information about the array of pixels read from the image, an instance of
pixdrill.image_reader.ArrayInfothe statistics, the data types of which are those returned from the functions used to compute the statistics
Standard statistics
pixdrill.drillstats contains built-in functions for computing
a suite of standard statistics. These functions take a list of
3D masked arrays. Each array contains the
pixels extracted for a Point for one image.
The standard stats functions assume that each image contains only one band.
So the shape of each array passed to the built-in functions must be
(1, nrows, ncols).
For an ImageItem, the array list passed to a
built-in function contains only one array. For a
pystac.Item the list contains an array for every
pystac.Asset drilled.
User statistics
Users can write custom functions to calculate statistics. The
Tutorial describes the signature of a
user’s statistics function and the objects that Pixel Driller passes to it.
The user also provides a name for their function. Pixel Driller calls each
user function and stores the value returned from the function
in the PointStats object.
For example, from
PointStats.calc_stats():
stats = self.item_stats[item_id] # Dictionary of stats for the Item
...
# user_stats is a list of tuples as supplied by the user. Each tuple
# contains the name of the statistic (a string) and the function
# that calculates it.
for stat_name, stat_func in user_stats:
stats[stat_name] = stat_func(stats[STATS_ARRAYINFO], item, self.pt)
Pixel Driller passes all the information it thinks the user needs to
calculate a statistic.
stats[STATS_ARRAYINFO] is the
pixdrill.image_reader.ArrayInfo object, which contains:
the pixel data, in the
dataattributethe asset id, in the
asset_idattributeplus the location of the pixels within the image it was read from
item is the pystac.Item or
ImageItem. The user can inspect its
properties, such as its id attribute.
self.pt is the Point object,
so the user knows which point is being operated on. The user can pass
additional information to the user function as
Point attributes with Python’s
built-in setattr() and getattr() functions.
Reprojecting points#
When reading pixels from an image, the Point’s bounding box is calculated in the image’s coordinate reference system (CRS). There are three coordinate reference systems that must be considered:
The coordinate reference system of the image
The coordinate reference system of the Point, as specified by the user
The coordinate reference system of the Point’s buffer attribute, which defines the size of the region of interest
It’s straight forward to transform the point’s location to the same CRS as the image. The buffer requires more attention.
For the buffer, we want it to be expressed in metres if the image’s CRS is projected, and in degrees if the image’s CRS is geographic. So we must convert the buffer to a length in metres if the user defines the buffer in degrees and the image has a projected CRS. Or convert the buffer to a length in degrees if the user defines it in metres (the default) and the image has a geographic CRS.
A complication arises when the buffer distance is defined in metres, the image’s CRS is geographic, and the point’s CRS is geographic. We don’t know which CRS the buffer distance is defined in. So we have to choose one.
The same complication arises when the buffer distance is defined in degrees, the image’s CRS is projected, and the point’s CRS is projected. Again, we don’t know which CRS the buffer distance is defined in and we have to choose one.
The details are in pixdrill.drillpoints.Point.change_buffer_units().
Contributing#
We welcome the community’s contributions.
We prefer to use the Fork and pull model for pull requests.
A suggested development environment#
The project’s Dockerfile is a good reference for creating the
development environment in which you can develop and run tests.
Use this along with the build-dev and run-dev
targets in the Makefile. Modify those targets for your own environment.
For example:
user@dev-host:~$ git clone https://github.com/cibolabs/pixeldriller.git
user@dev-host:~$ cd pixeldriller
user@dev-host:~/pixeldriller$ cp Makefile MyMakefile
# EDIT MyMakefile: update the build-dev and run-dev targets
user@dev-host:~/pixeldriller$ make -f MyMakefile build-dev
user@dev-host:~/pixeldriller$ make -f MyMakefile run-dev
# Then, from the running container, pip install an editable
# version of the package, and run the example
root@5d63691b9aa8:~/pixeldriller# source activate_dev
root@5d63691b9aa8:~/pixeldriller# python3 -m example
Stats for point: x=0, y=-1123600
Item ID=S2B_52LHP_20220730_0_L2A
Mean values: [443.80165289 219.33884298]
Item ID=S2A_52LHP_20220728_0_L2A
Mean values: [2543.60330579 2284.67768595]
Item ID=S2A_52LHP_20220725_0_L2A
Mean values: [492.32231405 403.69421488]
Stats for point: x=140, y=-36.5
Item ID=S2A_54HVE_20220730_0_L2A
Mean values: [3257.65289256 3140.01652893]
Item ID=S2B_54HVE_20220725_0_L2A
Mean values: [3945.52066116 3690.01652893]
Tests and coverage#
When contributing, please write a test for new features, and confirm that
all existing tests pass. Tests are located in the tests/ directory.
We use the pytest framework.
We also use coverage to show the test coverage.
From within the running development container, run tests using:
root@5d63691b9aa8:~/pixeldriller# python3 -m pytest -s tests
For coverage:
root@5d63691b9aa8:~/pixeldriller# python3 -m coverage run --source=pixdrill -m pytest tests
root@5d63691b9aa8:~/pixeldriller# python3 -m coverage report
# OR to generate a coverage report as HTML
root@5d63691b9aa8:~/pixeldriller# python3 -m coverage html
Documentation#
When contributing, please also update these docs.
Documentation is in the doc/ directory. Consider modifying the
tutorial or developer guide. Docs are written in
restructured text
and converted to HTML using sphinx.
To generate the HTML on your development machine:
user@dev-host:~$ cd pixeldriller
user@dev-host:~$ sudo apt-get install graphviz
user@dev-host:~$ python3 -m venv .doc_venv
user@dev-host:~$ source .doc_venv/bin/activate
user@dev-host:~$ (.doc_venv) $ pip install -e .[docs]
user@dev-host:~$ (.doc_venv) $ cd doc
user@dev-host:~$ (.doc_venv) $ make clean
user@dev-host:~$ (.doc_venv) $ make html
user@dev-host:~$ # To serve:
user@dev-host:~$ (.doc_venv) $ python3 -m http.server --directory build/html