Ctrl K

Earth Engine Satellite Features Reference

This notebook is a compact reference for building weekly satellite features with Google Earth Engine.

It focuses on a small example workflow:

  • Load local GeoJSON polygons from data/geojson_sample.geojson
  • Initialize Earth Engine with a Google Cloud project
  • Build short weekly windows
  • Extract Sentinel-1 SAR backscatter features
  • Explain the dB and linear ratio pattern
  • Build Sentinel-2 NDBI features with cloud masking
  • Save small pandas outputs under outputs/earth_engine_satellite_features_reference/

This is not a large scale export notebook. The goal is to keep the pattern easy to copy into larger projects.

Install packages

Run this once in the notebook environment.

In [ ]:
# pip install earthengine-api pandas numpy

Imports and paths

The notebook expects the GeoJSON file to exist at data/geojson_sample.geojson.

The output files are written to outputs/earth_engine_satellite_features_reference/.

In [2]:
from pathlib import Path
import json
import time

import numpy as np
import pandas as pd

DATA_DIR = Path("data")
OUTPUT_DIR = Path("outputs") / "earth_engine_satellite_features_reference"

DATA_DIR.mkdir(parents=True, exist_ok=True)
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

GEOJSON_PATH = DATA_DIR / "geojson_sample.geojson"

# print(GEOJSON_PATH)
# print(OUTPUT_DIR)

Create the sample GeoJSON if needed

If you downloaded geojson_sample.geojson separately and placed it in data/, this cell will leave it unchanged.

If the file is missing, this cell creates a small three-region sample.

In [3]:
sample_geojson = {'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.21003905177415, 25.076774258770524], [55.19664946437181, 25.060991951646685], [55.19501868129075, 25.05275013595706], [55.19785109401049, 25.045129851049683], [55.19965353846849, 25.043496871235302], [55.209524067643294, 25.045129851049687], [55.220424565079824, 25.050184174546025], [55.22282782435716, 25.05523828966504], [55.21982375026048, 25.065890126643815], [55.21458807826341, 25.07358686309247], [55.21003905177415, 25.076774258770524]]]}, 'id': '0', 'properties': {'name': 'jumeirah_village_circle'}}, {'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.249635385146775, 25.18003250445051], [55.2674881683499, 25.16830306645511], [55.27641455995146, 25.176847801781438], [55.28345267640654, 25.180964596806056], [55.294009851089164, 25.183605486431023], [55.293065713515915, 25.191294809657663], [55.289289163222946, 25.194867461359095], [55.2831951843411, 25.191372477113294], [55.277873681655564, 25.18469289493505], [55.26903312074248, 25.1902851282279], [55.26765982972685, 25.19432380373002], [55.26422660218779, 25.196265426975778], [55.249635385146775, 25.18003250445051]]]}, 'id': '1', 'properties': {'name': 'business_bay'}}, {'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.123528842419546, 25.070940045705534], [55.13082445094006, 25.065808830926738], [55.134515170544546, 25.066858415072858], [55.15502870509045, 25.08726520786672], [55.14884889552013, 25.091851409287244], [55.1451152605714, 25.08780934244539], [55.14292657801525, 25.08699313967015], [55.1378196520509, 25.091579351286892], [55.13447225520031, 25.092084601377486], [55.13408601710216, 25.090452247413005], [55.1346868319215, 25.0890142032599], [55.136703853100705, 25.089247400757078], [55.13734758326428, 25.087848209108426], [55.13696134516612, 25.086371267235386], [55.141810779065054, 25.082834307672364], [55.13043821284191, 25.071678611953597], [55.12824953028575, 25.07198958588346], [55.12558877894298, 25.07393315505499], [55.123528842419546, 25.070940045705534]]]}, 'id': '2', 'properties': {'name': 'sample_region_west'}}]}

if not GEOJSON_PATH.exists():
    GEOJSON_PATH.write_text(json.dumps(sample_geojson, indent=2), encoding="utf-8")

with open(GEOJSON_PATH, encoding="utf-8") as f:
    geojson_data = json.load(f)

region_names = [feature["properties"]["name"] for feature in geojson_data["features"]]

print(region_names)
['jumeirah_village_circle', 'business_bay', 'sample_region_west']

Authenticate and initialize Earth Engine

Use a Google Cloud project that has Earth Engine enabled.

For local setup, authenticate once from the shell:

earthengine authenticate

Then set your project id below.

In [21]:
from dotenv import load_dotenv # pip install python-dotenv
import os

load_dotenv()

PROJECT_ID = os.getenv("EE_PROJECT_ID")
In [22]:
import ee

try:
    ee.Initialize(project=PROJECT_ID)
    EE_READY = True
    print("Earth Engine initialized")
except Exception as exc:
    EE_READY = False
    print("Earth Engine is not initialized")
    print(exc)
Earth Engine initialized

Convert GeoJSON polygons to an Earth Engine FeatureCollection

Each GeoJSON feature becomes one Earth Engine feature.

The name property is kept because it becomes the region label in later outputs.

In [6]:
if EE_READY:
    ee_features = []

    for feature in geojson_data["features"]:
        ee_feature = ee.Feature(
            ee.Geometry(feature["geometry"]),
            {"name": feature["properties"]["name"]}
        )
        ee_features.append(ee_feature)

    region_fc = ee.FeatureCollection(ee_features)

    print("Feature count:", region_fc.size().getInfo())
else:
    region_fc = None
    print("Skipped because Earth Engine is not initialized")
Feature count: 3

Build weekly date windows

This helper creates a short list of weekly start dates on the Earth Engine server.

The date range is intentionally short so the example remains small.

In [7]:
def weekly_dates(start, end):
    s = ee.Date(start)
    e = ee.Date(end)
    n_weeks = e.difference(s, "week").subtract(1)
    return ee.List.sequence(0, n_weeks).map(lambda k: s.advance(k, "week"))

START_DATE = "2024-01-01"
END_DATE = "2024-02-12"

if EE_READY:
    week_starts = weekly_dates(START_DATE, END_DATE)
    print(week_starts.map(lambda d: ee.Date(d).format("YYYY-MM-dd")).getInfo())
else:
    week_starts = None
['2024-01-01', '2024-01-08', '2024-01-15', '2024-01-22', '2024-01-29', '2024-02-05']

Sentinel-1 SAR collection

Sentinel-1 is radar imagery. It does not need optical cloud masking.

The COPERNICUS/S1_GRD collection contains processed SAR backscatter bands such as VV and VH in dB.

In [8]:
if EE_READY:
    s1 = (
        ee.ImageCollection("COPERNICUS/S1_GRD")
        .filterDate(START_DATE, END_DATE)
        .filterBounds(region_fc.geometry())
        .filter(ee.Filter.eq("instrumentMode", "IW"))
        .filter(ee.Filter.listContains("transmitterReceiverPolarisation", "VV"))
        .filter(ee.Filter.listContains("transmitterReceiverPolarisation", "VH"))
    )

    print("Sentinel-1 image count:", s1.size().getInfo())
else:
    s1 = None
Sentinel-1 image count: 6

dB and linear ratio pattern

Sentinel-1 values are commonly represented in dB.

For ratios, calculate in linear scale first. Then convert the ratio back to dB if the final feature set is dB-based.

In [9]:
def add_sar_ratio_bands(img):
    vv = img.select("VV")
    vh = img.select("VH")

    ten = ee.Image.constant(10)

    vv_lin = ten.pow(vv.divide(10)).rename("VV_LIN")
    vh_lin = ten.pow(vh.divide(10)).rename("VH_LIN")

    ratio_lin = vv_lin.divide(vh_lin).rename("RATIO_LIN")
    ratio_db = ratio_lin.log10().multiply(10).rename("RATIO_DB")

    return img.addBands([vv_lin, vh_lin, ratio_lin, ratio_db])

Weekly Sentinel-1 region summary

This section extracts a small region-week table directly to pandas.

For large date ranges or many regions, use Earth Engine table exports instead of getInfo.

In [10]:
def one_week_s1_summary(date, region_feature):
    d_start = ee.Date(date)
    d_end = d_start.advance(1, "week")

    region_name = ee.String(region_feature.get("name"))
    region_geom = region_feature.geometry()

    weekly = s1.filterDate(d_start, d_end)
    count = weekly.size()

    empty = ee.Image.constant([0, 0, 0]).rename(["VV", "VH", "RATIO_DB"]).updateMask(ee.Image(0))

    composite = ee.Image(
        ee.Algorithms.If(
            count.gt(0),
            add_sar_ratio_bands(weekly.median()).select(["VV", "VH", "RATIO_DB"]),
            empty
        )
    )

    reducer = ee.Reducer.mean().combine(
        reducer2=ee.Reducer.median(),
        sharedInputs=True
    )

    stats = composite.reduceRegion(
        reducer=reducer,
        geometry=region_geom,
        scale=10,
        bestEffort=True
    )

    stats = ee.Dictionary(stats)

    return ee.Feature(None, {
        "region": region_name,
        "week_start": d_start.format("YYYY-MM-dd"),
        "vv_mean_db": ee.Algorithms.If(stats.contains("VV_mean"), stats.get("VV_mean"), None),
        "vv_median_db": ee.Algorithms.If(stats.contains("VV_median"), stats.get("VV_median"), None),
        "vh_mean_db": ee.Algorithms.If(stats.contains("VH_mean"), stats.get("VH_mean"), None),
        "vh_median_db": ee.Algorithms.If(stats.contains("VH_median"), stats.get("VH_median"), None),
        "vv_vh_ratio_mean_db": ee.Algorithms.If(stats.contains("RATIO_DB_mean"), stats.get("RATIO_DB_mean"), None),
        "n_obs": count
    })


if EE_READY:
    region_list = region_fc.toList(region_fc.size())

    output_collections = []

    for region_index in range(region_fc.size().getInfo()):
        region_feature = ee.Feature(region_list.get(region_index))

        region_weekly = ee.FeatureCollection(
            week_starts.map(lambda d: one_week_s1_summary(d, region_feature))
        )

        output_collections.append(region_weekly)

    s1_summary_fc = ee.FeatureCollection(output_collections).flatten()
    s1_rows = [feature["properties"] for feature in s1_summary_fc.getInfo()["features"]]

    df_s1 = pd.DataFrame(s1_rows)
    df_s1["week_start"] = pd.to_datetime(df_s1["week_start"])
    df_s1 = df_s1.sort_values(["region", "week_start"]).reset_index(drop=True)

    s1_path = OUTPUT_DIR / "sample_weekly_sentinel1_features.csv"
    df_s1.to_csv(s1_path, index=False)

    print(s1_path)
    display(df_s1.head())
else:
    df_s1 = pd.DataFrame()
    print("Skipped because Earth Engine is not initialized")
outputs/earth_engine_satellite_features_reference/sample_weekly_sentinel1_features.csv
n_obs region vh_mean_db vh_median_db vv_mean_db vv_median_db vv_vh_ratio_mean_db week_start
0 1 business_bay -9.883389 -11.628311 -2.775690 -4.372767 7.107699 2024-01-01
1 1 business_bay -10.846963 -12.626596 -2.825999 -4.373862 8.020964 2024-01-08
2 1 business_bay -9.778811 -11.377201 -3.332173 -4.877985 6.446638 2024-01-15
3 1 business_bay -10.543415 -12.377161 -2.710485 -4.372925 7.832930 2024-01-22
4 2 business_bay -10.154594 -11.125310 -2.934926 -3.874947 7.219669 2024-01-29

Optional Sentinel-1 Drive export pattern

Use this pattern when the table is too large to pull into the notebook.

The flag is off by default so the notebook does not submit tasks accidentally.

In [11]:
RUN_DRIVE_EXPORT = False
SLEEP_BETWEEN_TASKS = 2

if EE_READY and RUN_DRIVE_EXPORT:
    region_list = region_fc.toList(region_fc.size())

    for region_index in range(region_fc.size().getInfo()):
        region_feature = ee.Feature(region_list.get(region_index))
        region_name = region_feature.get("name").getInfo()

        region_weekly = ee.FeatureCollection(
            week_starts.map(lambda d: one_week_s1_summary(d, region_feature))
        )

        task = ee.batch.Export.table.toDrive(
            collection=region_weekly,
            description=f"{region_name}_weekly_s1sar_sample",
            fileNamePrefix=f"{region_name}_weekly_s1sar_sample",
            fileFormat="CSV"
        )

        task.start()
        print("Started task:", region_name)
        time.sleep(SLEEP_BETWEEN_TASKS)
else:
    print("Drive export not started")
Drive export not started

Sentinel-2 cloud mask

Sentinel-2 is optical imagery. Cloud handling is required.

The QA60 band marks cloud and cirrus bits. The mask keeps pixels where both bits are zero.

In [12]:
def mask_s2(img):
    qa = img.select("QA60")

    cloud = 1 << 10
    cirrus = 1 << 11

    mask = qa.bitwiseAnd(cloud).eq(0).And(
        qa.bitwiseAnd(cirrus).eq(0)
    )

    return img.updateMask(mask)

Sentinel-2 NDBI

NDBI is the Normalized Difference Built-up Index.

It compares SWIR and NIR reflectance:

NDBI = (SWIR - NIR) / (SWIR + NIR)
In [13]:
def add_ndbi(img):
    ndbi = img.normalizedDifference(["B11", "B8"]).rename("NDBI")
    return ndbi.copyProperties(img, ["system:time_start"])

Sentinel-2 collection and small NDBI summary

This example keeps the same short date range.

For larger extraction, use table export tasks.

In [14]:
CLOUD_THRESHOLD = 20

if EE_READY:
    s2_ndbi = (
        ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
        .filterDate(START_DATE, END_DATE)
        .filterBounds(region_fc.geometry())
        .filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", CLOUD_THRESHOLD))
        .map(mask_s2)
        .map(add_ndbi)
    )

    print("Sentinel-2 image count:", s2_ndbi.size().getInfo())
else:
    s2_ndbi = None
Sentinel-2 image count: 5
In [15]:
def one_week_s2_ndbi_summary(date, region_feature):
    d_start = ee.Date(date)
    d_end = d_start.advance(1, "week")

    region_name = ee.String(region_feature.get("name"))
    region_geom = region_feature.geometry()

    weekly = s2_ndbi.filterDate(d_start, d_end)
    count = weekly.size()

    empty = ee.Image.constant(0).rename("NDBI").updateMask(ee.Image(0))

    composite = ee.Image(
        ee.Algorithms.If(
            count.gt(0),
            weekly.median().select("NDBI"),
            empty
        )
    )

    stats = composite.reduceRegion(
        reducer=ee.Reducer.mean().combine(
            reducer2=ee.Reducer.median(),
            sharedInputs=True
        ),
        geometry=region_geom,
        scale=20,
        bestEffort=True
    )

    stats = ee.Dictionary(stats)

    return ee.Feature(None, {
        "region": region_name,
        "week_start": d_start.format("YYYY-MM-dd"),
        "ndbi_mean": ee.Algorithms.If(stats.contains("NDBI_mean"), stats.get("NDBI_mean"), None),
        "ndbi_median": ee.Algorithms.If(stats.contains("NDBI_median"), stats.get("NDBI_median"), None),
        "n_obs": count
    })


if EE_READY:
    region_list = region_fc.toList(region_fc.size())

    output_collections = []

    for region_index in range(region_fc.size().getInfo()):
        region_feature = ee.Feature(region_list.get(region_index))

        region_weekly = ee.FeatureCollection(
            week_starts.map(lambda d: one_week_s2_ndbi_summary(d, region_feature))
        )

        output_collections.append(region_weekly)

    s2_summary_fc = ee.FeatureCollection(output_collections).flatten()
    s2_rows = [feature["properties"] for feature in s2_summary_fc.getInfo()["features"]]

    df_s2 = pd.DataFrame(s2_rows)
    df_s2["week_start"] = pd.to_datetime(df_s2["week_start"])
    df_s2 = df_s2.sort_values(["region", "week_start"]).reset_index(drop=True)

    s2_path = OUTPUT_DIR / "sample_weekly_sentinel2_ndbi.csv"
    df_s2.to_csv(s2_path, index=False)

    print(s2_path)
    display(df_s2.head())
else:
    df_s2 = pd.DataFrame()
    print("Skipped because Earth Engine is not initialized")
outputs/earth_engine_satellite_features_reference/sample_weekly_sentinel2_ndbi.csv
n_obs ndbi_mean ndbi_median region week_start
0 1 0.044363 0.035317 business_bay 2024-01-01
1 2 0.019250 0.035276 business_bay 2024-01-08
2 1 0.016320 0.029295 business_bay 2024-01-15
3 1 0.028978 0.035348 business_bay 2024-01-22
4 0 NaN NaN business_bay 2024-01-29

Build current and lagged features in pandas

The exported CSVs can be aligned and lagged in pandas.

This section works with the Sentinel-1 sample output if it was created.

In [17]:
base_features = [
    "vv_mean_db",
    "vh_mean_db",
    "vv_vh_ratio_mean_db"
]

lags = [1, 2, 4]


def add_lags(df_base, lags_list):
    lag_frames = []

    for lag in lags_list:
        shifted = df_base.shift(lag)
        shifted.columns = [f"{col}_lag{lag}" for col in shifted.columns]
        lag_frames.append(shifted)

    if not lag_frames:
        return df_base.copy()

    return pd.concat([df_base] + lag_frames, axis=1)


if not df_s1.empty:
    region_name = df_s1["region"].iloc[0]
    df_region = df_s1[df_s1["region"] == region_name].copy()

    df_base = (
        df_region
        .set_index("week_start")[base_features]
        .astype(float)
        .sort_index()
        .resample("W-SUN")
        .mean()
        .rolling(2, min_periods=1)
        .mean()
    )

    df_lagged = add_lags(df_base, lags)

    lagged_path = OUTPUT_DIR / f"{region_name}_sar_lagged_features.csv"
    df_lagged.to_csv(lagged_path)

    print(lagged_path)
    display(df_lagged.head())
else:
    df_lagged = pd.DataFrame()
    print("No Sentinel-1 output available")
outputs/earth_engine_satellite_features_reference/business_bay_sar_lagged_features.csv
vv_mean_db vh_mean_db vv_vh_ratio_mean_db vv_mean_db_lag1 vh_mean_db_lag1 vv_vh_ratio_mean_db_lag1 vv_mean_db_lag2 vh_mean_db_lag2 vv_vh_ratio_mean_db_lag2 vv_mean_db_lag4 vh_mean_db_lag4 vv_vh_ratio_mean_db_lag4
week_start
2024-01-07 -2.775690 -9.883389 7.107699 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2024-01-14 -2.800844 -10.365176 7.564332 -2.775690 -9.883389 7.107699 NaN NaN NaN NaN NaN NaN
2024-01-21 -3.079086 -10.312887 7.233801 -2.800844 -10.365176 7.564332 -2.775690 -9.883389 7.107699 NaN NaN NaN
2024-01-28 -3.021329 -10.161113 7.139784 -3.079086 -10.312887 7.233801 -2.800844 -10.365176 7.564332 NaN NaN NaN
2024-02-04 -2.822705 -10.349005 7.526299 -3.021329 -10.161113 7.139784 -3.079086 -10.312887 7.233801 -2.77569 -9.883389 7.107699

Feature count checklist

This shows how the feature count grows when current and lagged features are used.

In [18]:
feature_count = len(base_features) * (1 + len(lags))

feature_count_table = pd.DataFrame({
    "scope": ["one_region"],
    "base_features": [len(base_features)],
    "lags": [len(lags)],
    "current_plus_lag_versions": [1 + len(lags)],
    "feature_count": [feature_count]
})

feature_count_path = OUTPUT_DIR / "feature_count_checklist.csv"
feature_count_table.to_csv(feature_count_path, index=False)

display(feature_count_table)
scope base_features lags current_plus_lag_versions feature_count
0 one_region 3 3 4 12