Earth Engine Satellite Features Reference

This notebook is a compact reference for building weekly satellite features with Google Earth Engine.

It focuses on a small example workflow:

Load local GeoJSON polygons from data/geojson_sample.geojson
Initialize Earth Engine with a Google Cloud project
Build short weekly windows
Extract Sentinel-1 SAR backscatter features
Explain the dB and linear ratio pattern
Build Sentinel-2 NDBI features with cloud masking
Save small pandas outputs under outputs/earth_engine_satellite_features_reference/

This is not a large scale export notebook. The goal is to keep the pattern easy to copy into larger projects.

Install packages

Run this once in the notebook environment.

In [ ]:

# pip install earthengine-api pandas numpy

Imports and paths

The notebook expects the GeoJSON file to exist at data/geojson_sample.geojson.

The output files are written to outputs/earth_engine_satellite_features_reference/.

In [2]:

from pathlib import Path
import json
import time

import numpy as np
import pandas as pd

DATA_DIR = Path("data")
OUTPUT_DIR = Path("outputs") / "earth_engine_satellite_features_reference"

DATA_DIR.mkdir(parents=True, exist_ok=True)
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

GEOJSON_PATH = DATA_DIR / "geojson_sample.geojson"

# print(GEOJSON_PATH)
# print(OUTPUT_DIR)

Create the sample GeoJSON if needed

If you downloaded geojson_sample.geojson separately and placed it in data/, this cell will leave it unchanged.

If the file is missing, this cell creates a small three-region sample.

In [3]:

sample_geojson = {'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.21003905177415, 25.076774258770524], [55.19664946437181, 25.060991951646685], [55.19501868129075, 25.05275013595706], [55.19785109401049, 25.045129851049683], [55.19965353846849, 25.043496871235302], [55.209524067643294, 25.045129851049687], [55.220424565079824, 25.050184174546025], [55.22282782435716, 25.05523828966504], [55.21982375026048, 25.065890126643815], [55.21458807826341, 25.07358686309247], [55.21003905177415, 25.076774258770524]]]}, 'id': '0', 'properties': {'name': 'jumeirah_village_circle'}}, {'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.249635385146775, 25.18003250445051], [55.2674881683499, 25.16830306645511], [55.27641455995146, 25.176847801781438], [55.28345267640654, 25.180964596806056], [55.294009851089164, 25.183605486431023], [55.293065713515915, 25.191294809657663], [55.289289163222946, 25.194867461359095], [55.2831951843411, 25.191372477113294], [55.277873681655564, 25.18469289493505], [55.26903312074248, 25.1902851282279], [55.26765982972685, 25.19432380373002], [55.26422660218779, 25.196265426975778], [55.249635385146775, 25.18003250445051]]]}, 'id': '1', 'properties': {'name': 'business_bay'}}, {'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.123528842419546, 25.070940045705534], [55.13082445094006, 25.065808830926738], [55.134515170544546, 25.066858415072858], [55.15502870509045, 25.08726520786672], [55.14884889552013, 25.091851409287244], [55.1451152605714, 25.08780934244539], [55.14292657801525, 25.08699313967015], [55.1378196520509, 25.091579351286892], [55.13447225520031, 25.092084601377486], [55.13408601710216, 25.090452247413005], [55.1346868319215, 25.0890142032599], [55.136703853100705, 25.089247400757078], [55.13734758326428, 25.087848209108426], [55.13696134516612, 25.086371267235386], [55.141810779065054, 25.082834307672364], [55.13043821284191, 25.071678611953597], [55.12824953028575, 25.07198958588346], [55.12558877894298, 25.07393315505499], [55.123528842419546, 25.070940045705534]]]}, 'id': '2', 'properties': {'name': 'sample_region_west'}}]}

if not GEOJSON_PATH.exists():
    GEOJSON_PATH.write_text(json.dumps(sample_geojson, indent=2), encoding="utf-8")

with open(GEOJSON_PATH, encoding="utf-8") as f:
    geojson_data = json.load(f)

region_names = [feature["properties"]["name"] for feature in geojson_data["features"]]

print(region_names)

['jumeirah_village_circle', 'business_bay', 'sample_region_west']

Authenticate and initialize Earth Engine

Use a Google Cloud project that has Earth Engine enabled.

For local setup, authenticate once from the shell:

earthengine authenticate

Then set your project id below.

In [21]:

from dotenv import load_dotenv # pip install python-dotenv
import os

load_dotenv()

PROJECT_ID = os.getenv("EE_PROJECT_ID")

In [22]:

import ee

try:
    ee.Initialize(project=PROJECT_ID)
    EE_READY = True
    print("Earth Engine initialized")
except Exception as exc:
    EE_READY = False
    print("Earth Engine is not initialized")
    print(exc)

Earth Engine initialized

Convert GeoJSON polygons to an Earth Engine FeatureCollection

Each GeoJSON feature becomes one Earth Engine feature.

The name property is kept because it becomes the region label in later outputs.

In [6]:

if EE_READY:
    ee_features = []

    for feature in geojson_data["features"]:
        ee_feature = ee.Feature(
            ee.Geometry(feature["geometry"]),
            {"name": feature["properties"]["name"]}
        )
        ee_features.append(ee_feature)

    region_fc = ee.FeatureCollection(ee_features)

    print("Feature count:", region_fc.size().getInfo())
else:
    region_fc = None
    print("Skipped because Earth Engine is not initialized")

Feature count: 3

Build weekly date windows

This helper creates a short list of weekly start dates on the Earth Engine server.

The date range is intentionally short so the example remains small.

In [7]:

def weekly_dates(start, end):
    s = ee.Date(start)
    e = ee.Date(end)
    n_weeks = e.difference(s, "week").subtract(1)
    return ee.List.sequence(0, n_weeks).map(lambda k: s.advance(k, "week"))

START_DATE = "2024-01-01"
END_DATE = "2024-02-12"

if EE_READY:
    week_starts = weekly_dates(START_DATE, END_DATE)
    print(week_starts.map(lambda d: ee.Date(d).format("YYYY-MM-dd")).getInfo())
else:
    week_starts = None

['2024-01-01', '2024-01-08', '2024-01-15', '2024-01-22', '2024-01-29', '2024-02-05']

Sentinel-1 SAR collection

Sentinel-1 is radar imagery. It does not need optical cloud masking.

The COPERNICUS/S1_GRD collection contains processed SAR backscatter bands such as VV and VH in dB.

In [8]:

if EE_READY:
    s1 = (
        ee.ImageCollection("COPERNICUS/S1_GRD")
        .filterDate(START_DATE, END_DATE)
        .filterBounds(region_fc.geometry())
        .filter(ee.Filter.eq("instrumentMode", "IW"))
        .filter(ee.Filter.listContains("transmitterReceiverPolarisation", "VV"))
        .filter(ee.Filter.listContains("transmitterReceiverPolarisation", "VH"))
    )

    print("Sentinel-1 image count:", s1.size().getInfo())
else:
    s1 = None

Sentinel-1 image count: 6

dB and linear ratio pattern

Sentinel-1 values are commonly represented in dB.

For ratios, calculate in linear scale first. Then convert the ratio back to dB if the final feature set is dB-based.

In [9]:

def add_sar_ratio_bands(img):
    vv = img.select("VV")
    vh = img.select("VH")

    ten = ee.Image.constant(10)

    vv_lin = ten.pow(vv.divide(10)).rename("VV_LIN")
    vh_lin = ten.pow(vh.divide(10)).rename("VH_LIN")

    ratio_lin = vv_lin.divide(vh_lin).rename("RATIO_LIN")
    ratio_db = ratio_lin.log10().multiply(10).rename("RATIO_DB")

    return img.addBands([vv_lin, vh_lin, ratio_lin, ratio_db])

Weekly Sentinel-1 region summary

This section extracts a small region-week table directly to pandas.

For large date ranges or many regions, use Earth Engine table exports instead of getInfo.

In [10]:

def one_week_s1_summary(date, region_feature):
    d_start = ee.Date(date)
    d_end = d_start.advance(1, "week")

    region_name = ee.String(region_feature.get("name"))
    region_geom = region_feature.geometry()

    weekly = s1.filterDate(d_start, d_end)
    count = weekly.size()

    empty = ee.Image.constant([0, 0, 0]).rename(["VV", "VH", "RATIO_DB"]).updateMask(ee.Image(0))

    composite = ee.Image(
        ee.Algorithms.If(
            count.gt(0),
            add_sar_ratio_bands(weekly.median()).select(["VV", "VH", "RATIO_DB"]),
            empty
        )
    )

    reducer = ee.Reducer.mean().combine(
        reducer2=ee.Reducer.median(),
        sharedInputs=True
    )

    stats = composite.reduceRegion(
        reducer=reducer,
        geometry=region_geom,
        scale=10,
        bestEffort=True
    )

    stats = ee.Dictionary(stats)

    return ee.Feature(None, {
        "region": region_name,
        "week_start": d_start.format("YYYY-MM-dd"),
        "vv_mean_db": ee.Algorithms.If(stats.contains("VV_mean"), stats.get("VV_mean"), None),
        "vv_median_db": ee.Algorithms.If(stats.contains("VV_median"), stats.get("VV_median"), None),
        "vh_mean_db": ee.Algorithms.If(stats.contains("VH_mean"), stats.get("VH_mean"), None),
        "vh_median_db": ee.Algorithms.If(stats.contains("VH_median"), stats.get("VH_median"), None),
        "vv_vh_ratio_mean_db": ee.Algorithms.If(stats.contains("RATIO_DB_mean"), stats.get("RATIO_DB_mean"), None),
        "n_obs": count
    })


if EE_READY:
    region_list = region_fc.toList(region_fc.size())

    output_collections = []

    for region_index in range(region_fc.size().getInfo()):
        region_feature = ee.Feature(region_list.get(region_index))

        region_weekly = ee.FeatureCollection(
            week_starts.map(lambda d: one_week_s1_summary(d, region_feature))
        )

        output_collections.append(region_weekly)

    s1_summary_fc = ee.FeatureCollection(output_collections).flatten()
    s1_rows = [feature["properties"] for feature in s1_summary_fc.getInfo()["features"]]

    df_s1 = pd.DataFrame(s1_rows)
    df_s1["week_start"] = pd.to_datetime(df_s1["week_start"])
    df_s1 = df_s1.sort_values(["region", "week_start"]).reset_index(drop=True)

    s1_path = OUTPUT_DIR / "sample_weekly_sentinel1_features.csv"
    df_s1.to_csv(s1_path, index=False)

    print(s1_path)
    display(df_s1.head())
else:
    df_s1 = pd.DataFrame()
    print("Skipped because Earth Engine is not initialized")

outputs/earth_engine_satellite_features_reference/sample_weekly_sentinel1_features.csv

	n_obs	region	vh_mean_db	vh_median_db	vv_mean_db	vv_median_db	vv_vh_ratio_mean_db	week_start
0	1	business_bay	-9.883389	-11.628311	-2.775690	-4.372767	7.107699	2024-01-01
1	1	business_bay	-10.846963	-12.626596	-2.825999	-4.373862	8.020964	2024-01-08
2	1	business_bay	-9.778811	-11.377201	-3.332173	-4.877985	6.446638	2024-01-15
3	1	business_bay	-10.543415	-12.377161	-2.710485	-4.372925	7.832930	2024-01-22
4	2	business_bay	-10.154594	-11.125310	-2.934926	-3.874947	7.219669	2024-01-29

Optional Sentinel-1 Drive export pattern

Use this pattern when the table is too large to pull into the notebook.

The flag is off by default so the notebook does not submit tasks accidentally.

In [11]:

RUN_DRIVE_EXPORT = False
SLEEP_BETWEEN_TASKS = 2

if EE_READY and RUN_DRIVE_EXPORT:
    region_list = region_fc.toList(region_fc.size())

    for region_index in range(region_fc.size().getInfo()):
        region_feature = ee.Feature(region_list.get(region_index))
        region_name = region_feature.get("name").getInfo()

        region_weekly = ee.FeatureCollection(
            week_starts.map(lambda d: one_week_s1_summary(d, region_feature))
        )

        task = ee.batch.Export.table.toDrive(
            collection=region_weekly,
            description=f"{region_name}_weekly_s1sar_sample",
            fileNamePrefix=f"{region_name}_weekly_s1sar_sample",
            fileFormat="CSV"
        )

        task.start()
        print("Started task:", region_name)
        time.sleep(SLEEP_BETWEEN_TASKS)
else:
    print("Drive export not started")

Drive export not started

Sentinel-2 cloud mask

Sentinel-2 is optical imagery. Cloud handling is required.

The QA60 band marks cloud and cirrus bits. The mask keeps pixels where both bits are zero.

In [12]:

def mask_s2(img):
    qa = img.select("QA60")

    cloud = 1 << 10
    cirrus = 1 << 11

    mask = qa.bitwiseAnd(cloud).eq(0).And(
        qa.bitwiseAnd(cirrus).eq(0)
    )

    return img.updateMask(mask)

Sentinel-2 NDBI

NDBI is the Normalized Difference Built-up Index.

It compares SWIR and NIR reflectance:

NDBI = (SWIR - NIR) / (SWIR + NIR)

In [13]:

def add_ndbi(img):
    ndbi = img.normalizedDifference(["B11", "B8"]).rename("NDBI")
    return ndbi.copyProperties(img, ["system:time_start"])

Sentinel-2 collection and small NDBI summary

This example keeps the same short date range.

For larger extraction, use table export tasks.

In [14]:

CLOUD_THRESHOLD = 20

if EE_READY:
    s2_ndbi = (
        ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
        .filterDate(START_DATE, END_DATE)
        .filterBounds(region_fc.geometry())
        .filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", CLOUD_THRESHOLD))
        .map(mask_s2)
        .map(add_ndbi)
    )

    print("Sentinel-2 image count:", s2_ndbi.size().getInfo())
else:
    s2_ndbi = None

Sentinel-2 image count: 5

In [15]:

def one_week_s2_ndbi_summary(date, region_feature):
    d_start = ee.Date(date)
    d_end = d_start.advance(1, "week")

    region_name = ee.String(region_feature.get("name"))
    region_geom = region_feature.geometry()

    weekly = s2_ndbi.filterDate(d_start, d_end)
    count = weekly.size()

    empty = ee.Image.constant(0).rename("NDBI").updateMask(ee.Image(0))

    composite = ee.Image(
        ee.Algorithms.If(
            count.gt(0),
            weekly.median().select("NDBI"),
            empty
        )
    )

    stats = composite.reduceRegion(
        reducer=ee.Reducer.mean().combine(
            reducer2=ee.Reducer.median(),
            sharedInputs=True
        ),
        geometry=region_geom,
        scale=20,
        bestEffort=True
    )

    stats = ee.Dictionary(stats)

    return ee.Feature(None, {
        "region": region_name,
        "week_start": d_start.format("YYYY-MM-dd"),
        "ndbi_mean": ee.Algorithms.If(stats.contains("NDBI_mean"), stats.get("NDBI_mean"), None),
        "ndbi_median": ee.Algorithms.If(stats.contains("NDBI_median"), stats.get("NDBI_median"), None),
        "n_obs": count
    })


if EE_READY:
    region_list = region_fc.toList(region_fc.size())

    output_collections = []

    for region_index in range(region_fc.size().getInfo()):
        region_feature = ee.Feature(region_list.get(region_index))

        region_weekly = ee.FeatureCollection(
            week_starts.map(lambda d: one_week_s2_ndbi_summary(d, region_feature))
        )

        output_collections.append(region_weekly)

    s2_summary_fc = ee.FeatureCollection(output_collections).flatten()
    s2_rows = [feature["properties"] for feature in s2_summary_fc.getInfo()["features"]]

    df_s2 = pd.DataFrame(s2_rows)
    df_s2["week_start"] = pd.to_datetime(df_s2["week_start"])
    df_s2 = df_s2.sort_values(["region", "week_start"]).reset_index(drop=True)

    s2_path = OUTPUT_DIR / "sample_weekly_sentinel2_ndbi.csv"
    df_s2.to_csv(s2_path, index=False)

    print(s2_path)
    display(df_s2.head())
else:
    df_s2 = pd.DataFrame()
    print("Skipped because Earth Engine is not initialized")

outputs/earth_engine_satellite_features_reference/sample_weekly_sentinel2_ndbi.csv

	n_obs	ndbi_mean	ndbi_median	region	week_start
0	1	0.044363	0.035317	business_bay	2024-01-01
1	2	0.019250	0.035276	business_bay	2024-01-08
2	1	0.016320	0.029295	business_bay	2024-01-15
3	1	0.028978	0.035348	business_bay	2024-01-22
4	0	NaN	NaN	business_bay	2024-01-29

Build current and lagged features in pandas

The exported CSVs can be aligned and lagged in pandas.

This section works with the Sentinel-1 sample output if it was created.

In [17]:

base_features = [
    "vv_mean_db",
    "vh_mean_db",
    "vv_vh_ratio_mean_db"
]

lags = [1, 2, 4]


def add_lags(df_base, lags_list):
    lag_frames = []

    for lag in lags_list:
        shifted = df_base.shift(lag)
        shifted.columns = [f"{col}_lag{lag}" for col in shifted.columns]
        lag_frames.append(shifted)

    if not lag_frames:
        return df_base.copy()

    return pd.concat([df_base] + lag_frames, axis=1)


if not df_s1.empty:
    region_name = df_s1["region"].iloc[0]
    df_region = df_s1[df_s1["region"] == region_name].copy()

    df_base = (
        df_region
        .set_index("week_start")[base_features]
        .astype(float)
        .sort_index()
        .resample("W-SUN")
        .mean()
        .rolling(2, min_periods=1)
        .mean()
    )

    df_lagged = add_lags(df_base, lags)

    lagged_path = OUTPUT_DIR / f"{region_name}_sar_lagged_features.csv"
    df_lagged.to_csv(lagged_path)

    print(lagged_path)
    display(df_lagged.head())
else:
    df_lagged = pd.DataFrame()
    print("No Sentinel-1 output available")

outputs/earth_engine_satellite_features_reference/business_bay_sar_lagged_features.csv

	vv_mean_db	vh_mean_db	vv_vh_ratio_mean_db	vv_mean_db_lag1	vh_mean_db_lag1	vv_vh_ratio_mean_db_lag1	vv_mean_db_lag2	vh_mean_db_lag2	vv_vh_ratio_mean_db_lag2	vv_mean_db_lag4	vh_mean_db_lag4	vv_vh_ratio_mean_db_lag4
week_start
2024-01-07	-2.775690	-9.883389	7.107699	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2024-01-14	-2.800844	-10.365176	7.564332	-2.775690	-9.883389	7.107699	NaN	NaN	NaN	NaN	NaN	NaN
2024-01-21	-3.079086	-10.312887	7.233801	-2.800844	-10.365176	7.564332	-2.775690	-9.883389	7.107699	NaN	NaN	NaN
2024-01-28	-3.021329	-10.161113	7.139784	-3.079086	-10.312887	7.233801	-2.800844	-10.365176	7.564332	NaN	NaN	NaN
2024-02-04	-2.822705	-10.349005	7.526299	-3.021329	-10.161113	7.139784	-3.079086	-10.312887	7.233801	-2.77569	-9.883389	7.107699

Feature count checklist

This shows how the feature count grows when current and lagged features are used.

In [18]:

feature_count = len(base_features) * (1 + len(lags))

feature_count_table = pd.DataFrame({
    "scope": ["one_region"],
    "base_features": [len(base_features)],
    "lags": [len(lags)],
    "current_plus_lag_versions": [1 + len(lags)],
    "feature_count": [feature_count]
})

feature_count_path = OUTPUT_DIR / "feature_count_checklist.csv"
feature_count_table.to_csv(feature_count_path, index=False)

display(feature_count_table)

	scope	base_features	lags	current_plus_lag_versions	feature_count
0	one_region	3	3	4	12