Raster Data

Data Introduction

Raster data is a spatial data model used in geographic information systems (GIS) to represent spatially continuous or discrete phenomena. It divides the spatial domain, meaning a geographic area, into a set of two-dimensional rectangular cells with a regular topological relationship. These cells are also called cells or pixels. Each cell is indexed by a unique coordinate position and stores one or more attribute values that describe the characteristics or state of the phenomenon within the space covered by that cell. Raster data is an implementation of a discrete coverage or a continuous coverage. Mathematically, it can be understood as samples of a spatial variable defined on a discrete grid. The spatial position of each cell is jointly determined by the spatial reference system, including the coordinate system and projection, and the raster geometry, including origin, resolution, and row-column indexing. This enables spatial analysis and map visualization.

Raster data has the following essential elements:

Spatial Domain (Domain Extent)
- Represents the geographic area covered by the raster, usually defined by a minimum bounding rectangle (bounding box).
Raster Geometry (Grid Geometry)
- Origin: defines the starting position of the raster, usually the coordinates of the upper-left or lower-left corner.
- Resolution or Cell Size: the ground size of each cell in the X and Y directions.
- Rows x Columns: the matrix dimensions.
- Grid Indexing: row and column numbers used to uniquely locate each cell.
Spatial Reference System (SRS)
- Defines the geographic position of the raster data in a geographic coordinate system or projected coordinate system.
Attribute Values (Range Values)
- Values stored in each cell, which may be:
  - Real-number values, such as elevation or temperature
  - Integer values, such as classification codes
  - Multiband values, such as multiple spectral bands in remote sensing imagery
NoData Value
- A cell value used to indicate invalid or missing data.
Metadata
- Describes the data source, generation method, sampling accuracy, processing history, and related information.

Data Characteristics

In a raster dataset, each cell, also called a pixel, has a value. This cell value represents the phenomenon described by the raster dataset, such as a category, magnitude, height, or spectral value. Categories may represent land-use classes such as grassland, forest, or roads. Magnitudes may represent gravity, noise pollution, or rainfall percentage. Height, or distance, may represent surface elevation above mean sea level and can be used to derive slope, aspect, and watershed attributes. Spectral values can represent light reflectance and color in satellite imagery and aerial photography.
Cell values can be positive or negative, and can be integer or floating-point values. Integer values are suitable for categorical, or discrete, data; floating-point values are suitable for continuous surfaces. A NoData value can also be used in cells to represent missing data.
The area, or surface, represented by each cell has equal height and width, and each cell occupies an equal portion of the entire raster surface. For example, a raster representing elevation, that is, a digital elevation model, may cover an area of 100 square kilometers. If the raster contains 100 cells, each cell represents 1 square kilometer with equal height and width, that is, 1 km x 1 km.
Cell size can be large or small, depending on the surface described by the raster dataset and the representation requirements of features in that surface. It may be square kilometers, square feet, or even square centimeters. Cell size determines the level of detail at which patterns or features appear in the raster. The smaller the cell size, the smoother or more detailed the raster becomes. However, more cells require longer processing time and more storage space. If the cell size is too large, information may be lost or fine patterns may become blurred. For example, if the cell size exceeds the width of a road, the road will not be represented in the raster dataset.

Data types supported by raster data:

Data Type	Description	Value Range
Byte	8-bit unsigned	0 ~ 255
Int8	8-bit signed	-128 ~ 127
UInt16	16-bit unsigned	0 ~ 65,535
Int16	16-bit signed	-32,768 ~ 32,767
UInt32	32-bit unsigned	0 ~ 4,294,967,295
Int32	32-bit signed	-2,147,483,648 ~ 2,147,483,647
Float32	32-bit floating point	Supports decimal values
Float64	64-bit floating point	Supports decimal values

Raster Data Coordinate Systems

The coordinate system in raster data is the coordinate system used to locate and describe each pixel or cell in the raster. In geographic information systems (GIS), raster data is commonly used to represent phenomena on the Earth's surface, such as terrain, land-cover types, and climate. These data are organized as a grid or raster, and each raster cell has a specific position and value.

The following are two common coordinate systems in raster data:

Pixel Coordinate System:
- In the simplest raster data, the coordinate system may be only a pixel coordinate system. In this case, each pixel is represented by its row and column number in the raster image, usually starting from the upper-left corner, where the first pixel has the coordinate (0, 0).
- This coordinate system does not consider geospatial location. It simply divides the raster image into fixed-size pixel units. It is suitable for simple image-processing tasks, but it cannot provide geospatial position information.
Geographic Coordinate System:
- Many raster datasets are based on a geographic coordinate system, meaning each raster cell corresponds to an actual location on the Earth's surface.
- A geographic coordinate system uses geographic longitude and latitude, or projected coordinates, to describe positions on the Earth's surface. Longitude represents east-west position, while latitude represents north-south position.
- A coordinate system is usually defined by a reference point, such as an origin, and a unit, such as degrees or meters. In a planar projection, the reference point may be the center point of the map projection, and the unit may be meters or another distance unit.
- Each pixel or cell in raster data is associated with a specific location in the geographic coordinate system. For example, a pixel in a raster image may correspond to a longitude-latitude coordinate on a map.

Data Storage

Raster data is organized as a raster or grid and is commonly used to represent geospatial phenomena such as terrain, land cover, and climate. A raster data format describes how data is stored on a computer, including data structure, storage method, metadata, and related information.

Common data formats are described below.

GeoTIFF

GeoTIFF is a GIS raster data format based on TIFF (Tagged Image File Format). It allows geospatial information to be stored together with standard image data. The GeoTIFF format supports multiple projections and geographic coordinate systems, and can contain geospatial metadata such as geographic extent, resolution, and coordinate system information.

Because of its broad support and flexibility, GeoTIFF is one of the most commonly used raster data formats in many GIS software products and tools.

Main Features of GeoTIFF

Compatible with all TIFF tools and libraries.
Directly embeds coordinate and projection information in the TIFF file header.
Supports multiband and multiresolution raster data. It can be used to store remote sensing imagery (satellite and aerial imagery), digital elevation models (DEMs), thermal infrared imagery, and more.
Supported by almost all GIS software, including GDAL, QGIS, ArcGIS, and ERDAS.

Core Content of GeoTIFF

GeoTIFF stores geographic information through TIFF tags and extension tags.

Coordinate reference information
- Geographic coordinate systems, such as longitude-latitude and WGS84
- Projected coordinate systems, such as UTM and Gauss-Kruger
- EPSG codes (SpatialRefSys)

Spatial positioning information

Image positioning is recorded through two key components:

A. Affine Transformation

It uses six parameters to describe the transformation from pixel coordinates to geographic coordinates:

Xgeo = GT0 + GT1 * i + GT2 * j
Ygeo = GT3 + GT4 * i + GT5 * j

GT0, GT3: upper-left corner coordinates
GT1, GT5: pixel resolution
GT2, GT4: rotation components, usually 0
B. Tie Points

Positioning is defined through pixel-geographic coordinate pairs.

Projection definitions. GeoTIFF can store:
- Projection type, such as UTM
- Ellipsoid, such as WGS84
- Datum
- Projection parameters, such as central meridian and scale factor

Cell values, supporting:
- 8-bit, 16-bit, or 32-bit integers or floating-point values
- Single-band or multiband data
- Compressed or uncompressed storage, such as LZW or JPEG

GeoTIFF and Cloud Optimized GeoTIFF

COG (Cloud Optimized GeoTIFF) is an improved form of GeoTIFF. It stores preview imagery internally as tiles and supports HTTP range requests, on-demand loading on the web, and multilayer overview pyramids.

ERDAS Imagine IMG Format

An IMG file is a raster data format developed by Hexagon Geospatial, formerly Leica Geosystems, for ERDAS Imagine software. It is one of the most commonly used large-image formats in remote sensing, geographic information, and image analysis.

Basic Overview

ERDAS IMG

Full name: ERDAS Imagine Raster Format
File extension: .img
Type: raster imagery
Supports: single-band, multiband, and multiresolution data
Compatible with major GIS platforms such as GDAL, ArcGIS, ENVI, and QGIS

Core Features

Supports any number of bands.
Can store very large imagery, including terabyte-scale datasets.
Embeds georeferencing information, including coordinate systems and affine transforms.
Supports pyramid layers.
Supports compression, including lossless and lossy compression.
Can store statistics such as minimum value, maximum value, and histograms.
Can carry rich metadata, such as sensor information and processing records.

File Components

A standard .img file consists of the following components:

File	Description
.img	Main data file, including raster data, bands, and metadata
.ige	Large-file extension, generated automatically when the dataset exceeds 2 GB
.aux	Auxiliary file, compatible with ArcGIS
.rrd	Pyramid layer file, used as thumbnail cache
.xml	Metadata document

Data Type Support

The IMG format supports multiple data precisions:

Data Type	Example Uses
8-bit integer	Standard imagery, such as visible-light imagery
16-bit integer	Remote sensing multispectral imagery and DEMs
32-bit floating point	Continuous variables, such as slope and reflectance
64-bit floating point	High-precision analysis results

Coordinates and Projection

IMG files can store:

Geographic coordinate systems, such as WGS84 EPSG:4326
Projected coordinate systems, such as UTM and Gauss-Kruger
Affine transformation matrices
Projection metadata

Performance Characteristics

Why is IMG commonly selected for large remote sensing projects?

Multiband support: a single file can contain dozens of bands, such as hyperspectral data.
Block storage: enables fast random access and often provides better performance than traditional GeoTIFF.
Pyramid layers: built-in multiresolution thumbnails enable fast zooming and browsing.
Large-file support: supports imagery larger than 2 GB through the .ige extension.
Flexible compression: supports RLE and JPEG compression.

Application Scenarios

Application Area	Examples
Satellite remote sensing imagery storage	Landsat, Sentinel, and MODIS data
Digital elevation models (DEMs)	SRTM and ASTER
Multiband raster analysis	Vegetation indices and soil moisture
Large-scale imagery mosaicking outputs	Orthophoto atlases

Other Common Formats

ASCII Grid:
- ASCII Grid is a simple text format for storing raster data. Each pixel value is represented as text and separated by spaces or other delimiters.
- The ASCII Grid format is simple, easy to understand, and highly readable, so it is widely used in some cases. However, because it is a text format, file sizes are usually large and it is not suitable for storing large-scale raster data.
NetCDF:
- NetCDF (Network Common Data Form) is a multidimensional array data format commonly used to store scientific data, including raster data.
- The NetCDF format supports metadata and multidimensional array storage, and provides flexible data access and subsetting capabilities.
- Because of its flexibility and efficiency, NetCDF is widely used for raster data storage and sharing in meteorology, oceanography, Earth science, and related fields.
GRIB:
- GRIB (GRIdded Binary) is a binary format commonly used to store meteorological and environmental data. It is used by the World Meteorological Organization (WMO) to share meteorological data.
- The GRIB format provides good compression performance and efficient storage, making it suitable for large-scale raster data.
- The GRIB format also supports metadata and multiple data layers.
HDF5:
- HDF5 (Hierarchical Data Format version 5) is a flexible data storage format commonly used to store scientific data, including raster data.
- The HDF5 format supports multidimensional arrays, metadata, and hierarchical relationships between datasets, providing efficient data storage and access.
- The HDF5 format is suitable for storing large-scale raster data and also supports data compression, parallel I/O, and related capabilities.

Concept Explanations

Blocks and Tiles

What are blocks?

In GeoTIFF and other raster formats, a block is a storage unit for image data. It is the smallest unit used to store data inside a raster file.

Blocks can be row-based strips, meaning one row of pixels in the image, or tile-based blocks, meaning small square or rectangular regions of the image. Row-based blocks extend across the full image width, while tile-based blocks are smaller rectangles.

Blocks are used to optimize data read and write efficiency. By dividing an image into small blocks, software can read or write only the image portion of interest instead of the entire image, improving efficiency when processing large image files.

What are tiles?

The term tile is commonly used to describe the division of a large image into smaller, regular square or rectangular parts. This is especially common in pyramid levels, web map services such as WMS and TMS, and some GIS applications. The main purpose is to improve network transfer and rendering efficiency.

Each tile is an independent part of the image and can be accessed and processed separately. Tiling makes it more efficient to view large images at different zoom levels because only the tiles in the user's current view need to be loaded.

Raster Pyramids (.ovr)

A raster pyramid, also called an image pyramid, is multiresolution image data generated to improve multilevel display performance for large images.

Its basic principle is:

The original raster image is progressively reduced to generate multiple copies. Each level has half the resolution of the previous level and is used for fast display at lower zoom levels.

In this way, when users zoom imagery in GIS software:

The software can directly read a copy at the corresponding resolution.
It does not need to load the full-resolution data all at once.
Display speed is significantly improved and memory usage is reduced.

For raster pyramid levels, assume the original image size is 8192 x 8192 pixels.

Pyramid levels usually decrease by a factor of 2 in resolution:

Level 0: 8192 x 8192 (original)
Level 1: 4096 x 4096
Level 2: 2048 x 2048
Level 3: 1024 x 1024
Level 4: 512 x 512
...
Down to the smallest thumbnail

There are generally two pyramid storage methods. iXGIS uses external pyramids.

Storage Method	Description
Internal pyramids	Pyramid data is written directly into the image file, such as inside a GeoTIFF or IMG file.
External pyramids	Pyramid data is stored as a separate file, commonly using the `.ovr` extension for overviews.

Data Introduction​

Data Characteristics​

Raster Data Coordinate Systems​

Data Storage​

GeoTIFF​

Main Features of GeoTIFF​

Core Content of GeoTIFF​

GeoTIFF and Cloud Optimized GeoTIFF​

ERDAS Imagine IMG Format​

Basic Overview​

Core Features​

File Components​

Data Type Support​

Coordinates and Projection​

Performance Characteristics​

Application Scenarios​

Other Common Formats​

Concept Explanations​

Blocks and Tiles​

Raster Pyramids (.ovr)​