Date created: June 10, 2022
Author: P. Alexander Burnham

Summary Here we explore the process of loading raster files into the spacetime environment by creating a spacetime file object. We will then run through some basic commands to query this data object and then create a spacetime cube object by specifying the intended dimensionality of the object.

Loading file names and creating a file object

Spacetime takes file names/paths as lists. Here we set up a list of geotif files. The initial read_data() function in spacetime uses GDAL’s python bindings at it’s base. It can read any file type that GDAL can read. For the full list of supported types, see the GDAL API. When a great number of files need to be read in, automated python functions like grep can pull lists of file paths from a given directory into the system based on file extension or some other feature of the file names. Here we pull in a few different tifs manually.

# create our list of file paths realtive to our directory
data = ["demoData/N0_AKestrel_1976_1980.tif", "demoData/N0_AKestrel_1977_1981.tif", "demoData/N0_AKestrel_1978_1982.tif"]

# spacetime function "read_data()" to create a file object
ds = read_data(data)

Querry the file object

We have now created a file object. This object may be queried using a series of methods documented in the API. These methods return a list of the length of the number of files read in by the read_data() function. For instance, we can look at the EPSG code for each file. This will return a list of length 3.

# examine the EPSG code for each file
ds.get_epsg_code()
## ['EPSG:4269', 'EPSG:4269', 'EPSG:4269']

Information like nodata values, upperleft corners, number of bands etc. can be queried at this point, too. A full list if methods can be found on the spacetime API


ds.get_nodata_value()
## [-3.3999999521443642e+38, -3.3999999521443642e+38, -3.3999999521443642e+38]
ds.get_UL_corner()
## [[1156212.8177116, 1200926.03642844], [1156212.8177116, 1200926.03642844], [1156212.8177116, 1200926.03642844]]
ds.get_band_number()
## [1, 1, 1]

Examine the actual data sets

We can extract a list of numpy arrays that contain the actual data from the files as well. Lets visualize them as an image rather than a large matrix.


# import matplotlib 
import matplotlib.pyplot as plt

# extract the data sets from the file object
plotData = ds.get_data_array()

# plot the first file removing values below 0 (i.e. nodata vals)
plt.imshow(plotData[0], vmin=0)
plt.show()

Here we look at the dimensions of the first dataset in the list

# what is the shape of the first file
ds.get_data_array()[0].shape
## (109, 93, 1)

Lets turn these files into a cube object

Each file, in the case of the kestral dataset, is a year (or rather a 4 year average). Each file as we saw above has a spatial dimension of 109 x 93 and a single band. We want a temporal spatial cube with the dimension of 109 x 93 x 3 (lat x lot x time).

When we create a cube, we can pass time vectors and variable names to the make_cube() function. Otherwise, the function will make time an integer vector of 0 to T and variables will be assigned as characters A, B, C… through N. In this case, we only have one variable contained in the dataset but we have a time dimension that we can make explicit. First let’s create a time vector of length 3 using the cube_time() function.

# make a time vec
yearObj = cube_time(start="1976", length=3, scale = "year")
print(yearObj)
## DatetimeIndex(['1976-12-31', '1977-12-31', '1978-12-31'], dtype='datetime64[ns]', freq=None)

Now we will make a cube object from this file object. When cubes are created, a .nc4 file is written to disk. This allows for virtualization through a number of third-party python libraries like dask and netCDF4. The arguments organizeFiles and organizeBands set up the structure of the cube based on the data files. “filestotime” and “bandstotime” together will take every file and every band within every file and treat them as time slices when compiling the cube. If variables are stored in different files or bands, the “filestovar” or “bandstovar” arguments would be used instead.

# make a cube
cube3d = make_cube(data = ds, fileName = "test1.nc4", organizeFiles = "filestotime", organizeBands="bandstotime", timeObj = yearObj)

Setting up a cube with multiple variables

Here we have a set of files names 2001 and 2002. Each has three bands that are RGB color bands. Lets read the data in, check the number of bands to make sure, create a time vector, pass the correct varaible names and output a cleaned cube object.

# read files
data1 = ["demoData/2001.tif", "demoData/2002.tif"]

# make file object
ds1 = read_data(data1)

# check bands
ds1.get_band_number()
## [3, 3]

We have the correct number of bands, 3. Lets assemble our cube. We use the “filestotime” and “bandstovar” arguments and pass R G and B to varNames as a list.

# make time vector
yearObj1 = cube_time(start="2001", length=2, scale = "year")

# make cube
cube4d = make_cube(data = ds1, fileName = "test2.nc4", organizeFiles = "filestotime", organizeBands="bandstovar", varNames=["R","G","B"], timeObj=yearObj1)

Querrying our cube object

Lets take a look at our cube object. All of the same commands work on a cube object as on a file object. However, what is returned is slightly different. Instead of a list of multiple returns, we have one unified cube now that returns a single value or object. Let’s look at the extracted data from the 3d and 4d cubes we just made.

# look at data
cube3d.get_data_array()
<xarray.DataArray (lat: 109, lon: 93, time: 3)>
array([[[-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        ...,
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38]],

       [[-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        ...,
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38]],

       [[-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        ...,
...
        ...,
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38]],

       [[-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        ...,
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38]],

       [[-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        ...,
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38]]], dtype=float32)
Coordinates:
  * lon      (lon) float32 1.201e+06 1.211e+06 1.221e+06 ... 2.111e+06 2.121e+06
  * lat      (lat) float32 1.156e+06 1.146e+06 1.136e+06 ... 8.621e+04 7.621e+04
  * time     (time) datetime64[ns] 1976-12-31 1977-12-31 1978-12-31
cube4d.get_data_array()
<xarray.DataArray (variables: 3, lat: 1001, lon: 1001, time: 2)>
array([[[[   0.,    0.],
         [ 799.,  799.],
         [ 788.,  788.],
         ...,
         [1061., 1061.],
         [1433., 1433.],
         [2430., 2430.]],

        [[   0.,    0.],
         [ 774.,  774.],
         [ 792.,  792.],
         ...,
         [1027., 1027.],
         [1125., 1125.],
         [1733., 1733.]],

        [[   0.,    0.],
         [ 797.,  797.],
         [ 801.,  801.],
         ...,
...
         ...,
         [1562., 1562.],
         [1415., 1415.],
         [1825., 1825.]],

        [[   0.,    0.],
         [ 392.,  392.],
         [ 408.,  408.],
         ...,
         [1573., 1573.],
         [1680., 1680.],
         [1780., 1780.]],

        [[   0.,    0.],
         [   0.,    0.],
         [   0.,    0.],
         ...,
         [   0.,    0.],
         [   0.,    0.],
         [   0.,    0.]]]], dtype=float32)
Coordinates:
  * variables  (variables) <U1 'R' 'G' 'B'
  * lon        (lon) float32 5.905e+05 5.905e+05 ... 6.005e+05 6.005e+05
  * lat        (lat) float32 5.791e+06 5.791e+06 ... 5.781e+06 5.781e+06
  * time       (time) datetime64[ns] 2001-12-31 2002-12-31

The xarray data structures that are returned can be operated upon like any numpy array. They also contains some useful metadata like the dimension of the cube and the names of those dimensions. Under coordinates for the 4d cube, we see the additional dimension named variables. This contains our 3 levels of R G and B. For a full list of cube object methods, see the spacetime API.