Earth Engine Satellite Features Reference
This notebook is a compact reference for building weekly satellite features with Google Earth Engine.
It focuses on a small example workflow:
- Load local GeoJSON polygons from
data/geojson_sample.geojson - Initialize Earth Engine with a Google Cloud project
- Build short weekly windows
- Extract Sentinel-1 SAR backscatter features
- Explain the dB and linear ratio pattern
- Build Sentinel-2 NDBI features with cloud masking
- Save small pandas outputs under
outputs/earth_engine_satellite_features_reference/
This is not a large scale export notebook. The goal is to keep the pattern easy to copy into larger projects.
Install packages
Run this once in the notebook environment.
# pip install earthengine-api pandas numpy
Imports and paths
The notebook expects the GeoJSON file to exist at data/geojson_sample.geojson.
The output files are written to outputs/earth_engine_satellite_features_reference/.
from pathlib import Path
import json
import time
import numpy as np
import pandas as pd
DATA_DIR = Path("data")
OUTPUT_DIR = Path("outputs") / "earth_engine_satellite_features_reference"
DATA_DIR.mkdir(parents=True, exist_ok=True)
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
GEOJSON_PATH = DATA_DIR / "geojson_sample.geojson"
# print(GEOJSON_PATH)
# print(OUTPUT_DIR)
Create the sample GeoJSON if needed
If you downloaded geojson_sample.geojson separately and placed it in data/, this cell will leave it unchanged.
If the file is missing, this cell creates a small three-region sample.
sample_geojson = {'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.21003905177415, 25.076774258770524], [55.19664946437181, 25.060991951646685], [55.19501868129075, 25.05275013595706], [55.19785109401049, 25.045129851049683], [55.19965353846849, 25.043496871235302], [55.209524067643294, 25.045129851049687], [55.220424565079824, 25.050184174546025], [55.22282782435716, 25.05523828966504], [55.21982375026048, 25.065890126643815], [55.21458807826341, 25.07358686309247], [55.21003905177415, 25.076774258770524]]]}, 'id': '0', 'properties': {'name': 'jumeirah_village_circle'}}, {'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.249635385146775, 25.18003250445051], [55.2674881683499, 25.16830306645511], [55.27641455995146, 25.176847801781438], [55.28345267640654, 25.180964596806056], [55.294009851089164, 25.183605486431023], [55.293065713515915, 25.191294809657663], [55.289289163222946, 25.194867461359095], [55.2831951843411, 25.191372477113294], [55.277873681655564, 25.18469289493505], [55.26903312074248, 25.1902851282279], [55.26765982972685, 25.19432380373002], [55.26422660218779, 25.196265426975778], [55.249635385146775, 25.18003250445051]]]}, 'id': '1', 'properties': {'name': 'business_bay'}}, {'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[55.123528842419546, 25.070940045705534], [55.13082445094006, 25.065808830926738], [55.134515170544546, 25.066858415072858], [55.15502870509045, 25.08726520786672], [55.14884889552013, 25.091851409287244], [55.1451152605714, 25.08780934244539], [55.14292657801525, 25.08699313967015], [55.1378196520509, 25.091579351286892], [55.13447225520031, 25.092084601377486], [55.13408601710216, 25.090452247413005], [55.1346868319215, 25.0890142032599], [55.136703853100705, 25.089247400757078], [55.13734758326428, 25.087848209108426], [55.13696134516612, 25.086371267235386], [55.141810779065054, 25.082834307672364], [55.13043821284191, 25.071678611953597], [55.12824953028575, 25.07198958588346], [55.12558877894298, 25.07393315505499], [55.123528842419546, 25.070940045705534]]]}, 'id': '2', 'properties': {'name': 'sample_region_west'}}]}
if not GEOJSON_PATH.exists():
GEOJSON_PATH.write_text(json.dumps(sample_geojson, indent=2), encoding="utf-8")
with open(GEOJSON_PATH, encoding="utf-8") as f:
geojson_data = json.load(f)
region_names = [feature["properties"]["name"] for feature in geojson_data["features"]]
print(region_names)
['jumeirah_village_circle', 'business_bay', 'sample_region_west']
Authenticate and initialize Earth Engine
Use a Google Cloud project that has Earth Engine enabled.
For local setup, authenticate once from the shell:
earthengine authenticate
Then set your project id below.
from dotenv import load_dotenv # pip install python-dotenv
import os
load_dotenv()
PROJECT_ID = os.getenv("EE_PROJECT_ID")
import ee
try:
ee.Initialize(project=PROJECT_ID)
EE_READY = True
print("Earth Engine initialized")
except Exception as exc:
EE_READY = False
print("Earth Engine is not initialized")
print(exc)
Earth Engine initialized
Convert GeoJSON polygons to an Earth Engine FeatureCollection
Each GeoJSON feature becomes one Earth Engine feature.
The name property is kept because it becomes the region label in later outputs.
if EE_READY:
ee_features = []
for feature in geojson_data["features"]:
ee_feature = ee.Feature(
ee.Geometry(feature["geometry"]),
{"name": feature["properties"]["name"]}
)
ee_features.append(ee_feature)
region_fc = ee.FeatureCollection(ee_features)
print("Feature count:", region_fc.size().getInfo())
else:
region_fc = None
print("Skipped because Earth Engine is not initialized")
Feature count: 3
Build weekly date windows
This helper creates a short list of weekly start dates on the Earth Engine server.
The date range is intentionally short so the example remains small.
def weekly_dates(start, end):
s = ee.Date(start)
e = ee.Date(end)
n_weeks = e.difference(s, "week").subtract(1)
return ee.List.sequence(0, n_weeks).map(lambda k: s.advance(k, "week"))
START_DATE = "2024-01-01"
END_DATE = "2024-02-12"
if EE_READY:
week_starts = weekly_dates(START_DATE, END_DATE)
print(week_starts.map(lambda d: ee.Date(d).format("YYYY-MM-dd")).getInfo())
else:
week_starts = None
['2024-01-01', '2024-01-08', '2024-01-15', '2024-01-22', '2024-01-29', '2024-02-05']
Sentinel-1 SAR collection
Sentinel-1 is radar imagery. It does not need optical cloud masking.
The COPERNICUS/S1_GRD collection contains processed SAR backscatter bands such as VV and VH in dB.
if EE_READY:
s1 = (
ee.ImageCollection("COPERNICUS/S1_GRD")
.filterDate(START_DATE, END_DATE)
.filterBounds(region_fc.geometry())
.filter(ee.Filter.eq("instrumentMode", "IW"))
.filter(ee.Filter.listContains("transmitterReceiverPolarisation", "VV"))
.filter(ee.Filter.listContains("transmitterReceiverPolarisation", "VH"))
)
print("Sentinel-1 image count:", s1.size().getInfo())
else:
s1 = None
Sentinel-1 image count: 6
dB and linear ratio pattern
Sentinel-1 values are commonly represented in dB.
For ratios, calculate in linear scale first. Then convert the ratio back to dB if the final feature set is dB-based.
def add_sar_ratio_bands(img):
vv = img.select("VV")
vh = img.select("VH")
ten = ee.Image.constant(10)
vv_lin = ten.pow(vv.divide(10)).rename("VV_LIN")
vh_lin = ten.pow(vh.divide(10)).rename("VH_LIN")
ratio_lin = vv_lin.divide(vh_lin).rename("RATIO_LIN")
ratio_db = ratio_lin.log10().multiply(10).rename("RATIO_DB")
return img.addBands([vv_lin, vh_lin, ratio_lin, ratio_db])
Weekly Sentinel-1 region summary
This section extracts a small region-week table directly to pandas.
For large date ranges or many regions, use Earth Engine table exports instead of getInfo.
def one_week_s1_summary(date, region_feature):
d_start = ee.Date(date)
d_end = d_start.advance(1, "week")
region_name = ee.String(region_feature.get("name"))
region_geom = region_feature.geometry()
weekly = s1.filterDate(d_start, d_end)
count = weekly.size()
empty = ee.Image.constant([0, 0, 0]).rename(["VV", "VH", "RATIO_DB"]).updateMask(ee.Image(0))
composite = ee.Image(
ee.Algorithms.If(
count.gt(0),
add_sar_ratio_bands(weekly.median()).select(["VV", "VH", "RATIO_DB"]),
empty
)
)
reducer = ee.Reducer.mean().combine(
reducer2=ee.Reducer.median(),
sharedInputs=True
)
stats = composite.reduceRegion(
reducer=reducer,
geometry=region_geom,
scale=10,
bestEffort=True
)
stats = ee.Dictionary(stats)
return ee.Feature(None, {
"region": region_name,
"week_start": d_start.format("YYYY-MM-dd"),
"vv_mean_db": ee.Algorithms.If(stats.contains("VV_mean"), stats.get("VV_mean"), None),
"vv_median_db": ee.Algorithms.If(stats.contains("VV_median"), stats.get("VV_median"), None),
"vh_mean_db": ee.Algorithms.If(stats.contains("VH_mean"), stats.get("VH_mean"), None),
"vh_median_db": ee.Algorithms.If(stats.contains("VH_median"), stats.get("VH_median"), None),
"vv_vh_ratio_mean_db": ee.Algorithms.If(stats.contains("RATIO_DB_mean"), stats.get("RATIO_DB_mean"), None),
"n_obs": count
})
if EE_READY:
region_list = region_fc.toList(region_fc.size())
output_collections = []
for region_index in range(region_fc.size().getInfo()):
region_feature = ee.Feature(region_list.get(region_index))
region_weekly = ee.FeatureCollection(
week_starts.map(lambda d: one_week_s1_summary(d, region_feature))
)
output_collections.append(region_weekly)
s1_summary_fc = ee.FeatureCollection(output_collections).flatten()
s1_rows = [feature["properties"] for feature in s1_summary_fc.getInfo()["features"]]
df_s1 = pd.DataFrame(s1_rows)
df_s1["week_start"] = pd.to_datetime(df_s1["week_start"])
df_s1 = df_s1.sort_values(["region", "week_start"]).reset_index(drop=True)
s1_path = OUTPUT_DIR / "sample_weekly_sentinel1_features.csv"
df_s1.to_csv(s1_path, index=False)
print(s1_path)
display(df_s1.head())
else:
df_s1 = pd.DataFrame()
print("Skipped because Earth Engine is not initialized")
outputs/earth_engine_satellite_features_reference/sample_weekly_sentinel1_features.csv
| n_obs | region | vh_mean_db | vh_median_db | vv_mean_db | vv_median_db | vv_vh_ratio_mean_db | week_start | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | business_bay | -9.883389 | -11.628311 | -2.775690 | -4.372767 | 7.107699 | 2024-01-01 |
| 1 | 1 | business_bay | -10.846963 | -12.626596 | -2.825999 | -4.373862 | 8.020964 | 2024-01-08 |
| 2 | 1 | business_bay | -9.778811 | -11.377201 | -3.332173 | -4.877985 | 6.446638 | 2024-01-15 |
| 3 | 1 | business_bay | -10.543415 | -12.377161 | -2.710485 | -4.372925 | 7.832930 | 2024-01-22 |
| 4 | 2 | business_bay | -10.154594 | -11.125310 | -2.934926 | -3.874947 | 7.219669 | 2024-01-29 |
Optional Sentinel-1 Drive export pattern
Use this pattern when the table is too large to pull into the notebook.
The flag is off by default so the notebook does not submit tasks accidentally.
RUN_DRIVE_EXPORT = False
SLEEP_BETWEEN_TASKS = 2
if EE_READY and RUN_DRIVE_EXPORT:
region_list = region_fc.toList(region_fc.size())
for region_index in range(region_fc.size().getInfo()):
region_feature = ee.Feature(region_list.get(region_index))
region_name = region_feature.get("name").getInfo()
region_weekly = ee.FeatureCollection(
week_starts.map(lambda d: one_week_s1_summary(d, region_feature))
)
task = ee.batch.Export.table.toDrive(
collection=region_weekly,
description=f"{region_name}_weekly_s1sar_sample",
fileNamePrefix=f"{region_name}_weekly_s1sar_sample",
fileFormat="CSV"
)
task.start()
print("Started task:", region_name)
time.sleep(SLEEP_BETWEEN_TASKS)
else:
print("Drive export not started")
Drive export not started
Sentinel-2 cloud mask
Sentinel-2 is optical imagery. Cloud handling is required.
The QA60 band marks cloud and cirrus bits. The mask keeps pixels where both bits are zero.
def mask_s2(img):
qa = img.select("QA60")
cloud = 1 << 10
cirrus = 1 << 11
mask = qa.bitwiseAnd(cloud).eq(0).And(
qa.bitwiseAnd(cirrus).eq(0)
)
return img.updateMask(mask)
Sentinel-2 NDBI
NDBI is the Normalized Difference Built-up Index.
It compares SWIR and NIR reflectance:
NDBI = (SWIR - NIR) / (SWIR + NIR)
def add_ndbi(img):
ndbi = img.normalizedDifference(["B11", "B8"]).rename("NDBI")
return ndbi.copyProperties(img, ["system:time_start"])
Sentinel-2 collection and small NDBI summary
This example keeps the same short date range.
For larger extraction, use table export tasks.
CLOUD_THRESHOLD = 20
if EE_READY:
s2_ndbi = (
ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
.filterDate(START_DATE, END_DATE)
.filterBounds(region_fc.geometry())
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", CLOUD_THRESHOLD))
.map(mask_s2)
.map(add_ndbi)
)
print("Sentinel-2 image count:", s2_ndbi.size().getInfo())
else:
s2_ndbi = None
Sentinel-2 image count: 5
def one_week_s2_ndbi_summary(date, region_feature):
d_start = ee.Date(date)
d_end = d_start.advance(1, "week")
region_name = ee.String(region_feature.get("name"))
region_geom = region_feature.geometry()
weekly = s2_ndbi.filterDate(d_start, d_end)
count = weekly.size()
empty = ee.Image.constant(0).rename("NDBI").updateMask(ee.Image(0))
composite = ee.Image(
ee.Algorithms.If(
count.gt(0),
weekly.median().select("NDBI"),
empty
)
)
stats = composite.reduceRegion(
reducer=ee.Reducer.mean().combine(
reducer2=ee.Reducer.median(),
sharedInputs=True
),
geometry=region_geom,
scale=20,
bestEffort=True
)
stats = ee.Dictionary(stats)
return ee.Feature(None, {
"region": region_name,
"week_start": d_start.format("YYYY-MM-dd"),
"ndbi_mean": ee.Algorithms.If(stats.contains("NDBI_mean"), stats.get("NDBI_mean"), None),
"ndbi_median": ee.Algorithms.If(stats.contains("NDBI_median"), stats.get("NDBI_median"), None),
"n_obs": count
})
if EE_READY:
region_list = region_fc.toList(region_fc.size())
output_collections = []
for region_index in range(region_fc.size().getInfo()):
region_feature = ee.Feature(region_list.get(region_index))
region_weekly = ee.FeatureCollection(
week_starts.map(lambda d: one_week_s2_ndbi_summary(d, region_feature))
)
output_collections.append(region_weekly)
s2_summary_fc = ee.FeatureCollection(output_collections).flatten()
s2_rows = [feature["properties"] for feature in s2_summary_fc.getInfo()["features"]]
df_s2 = pd.DataFrame(s2_rows)
df_s2["week_start"] = pd.to_datetime(df_s2["week_start"])
df_s2 = df_s2.sort_values(["region", "week_start"]).reset_index(drop=True)
s2_path = OUTPUT_DIR / "sample_weekly_sentinel2_ndbi.csv"
df_s2.to_csv(s2_path, index=False)
print(s2_path)
display(df_s2.head())
else:
df_s2 = pd.DataFrame()
print("Skipped because Earth Engine is not initialized")
outputs/earth_engine_satellite_features_reference/sample_weekly_sentinel2_ndbi.csv
| n_obs | ndbi_mean | ndbi_median | region | week_start | |
|---|---|---|---|---|---|
| 0 | 1 | 0.044363 | 0.035317 | business_bay | 2024-01-01 |
| 1 | 2 | 0.019250 | 0.035276 | business_bay | 2024-01-08 |
| 2 | 1 | 0.016320 | 0.029295 | business_bay | 2024-01-15 |
| 3 | 1 | 0.028978 | 0.035348 | business_bay | 2024-01-22 |
| 4 | 0 | NaN | NaN | business_bay | 2024-01-29 |
Build current and lagged features in pandas
The exported CSVs can be aligned and lagged in pandas.
This section works with the Sentinel-1 sample output if it was created.
base_features = [
"vv_mean_db",
"vh_mean_db",
"vv_vh_ratio_mean_db"
]
lags = [1, 2, 4]
def add_lags(df_base, lags_list):
lag_frames = []
for lag in lags_list:
shifted = df_base.shift(lag)
shifted.columns = [f"{col}_lag{lag}" for col in shifted.columns]
lag_frames.append(shifted)
if not lag_frames:
return df_base.copy()
return pd.concat([df_base] + lag_frames, axis=1)
if not df_s1.empty:
region_name = df_s1["region"].iloc[0]
df_region = df_s1[df_s1["region"] == region_name].copy()
df_base = (
df_region
.set_index("week_start")[base_features]
.astype(float)
.sort_index()
.resample("W-SUN")
.mean()
.rolling(2, min_periods=1)
.mean()
)
df_lagged = add_lags(df_base, lags)
lagged_path = OUTPUT_DIR / f"{region_name}_sar_lagged_features.csv"
df_lagged.to_csv(lagged_path)
print(lagged_path)
display(df_lagged.head())
else:
df_lagged = pd.DataFrame()
print("No Sentinel-1 output available")
outputs/earth_engine_satellite_features_reference/business_bay_sar_lagged_features.csv
| vv_mean_db | vh_mean_db | vv_vh_ratio_mean_db | vv_mean_db_lag1 | vh_mean_db_lag1 | vv_vh_ratio_mean_db_lag1 | vv_mean_db_lag2 | vh_mean_db_lag2 | vv_vh_ratio_mean_db_lag2 | vv_mean_db_lag4 | vh_mean_db_lag4 | vv_vh_ratio_mean_db_lag4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| week_start | ||||||||||||
| 2024-01-07 | -2.775690 | -9.883389 | 7.107699 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2024-01-14 | -2.800844 | -10.365176 | 7.564332 | -2.775690 | -9.883389 | 7.107699 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2024-01-21 | -3.079086 | -10.312887 | 7.233801 | -2.800844 | -10.365176 | 7.564332 | -2.775690 | -9.883389 | 7.107699 | NaN | NaN | NaN |
| 2024-01-28 | -3.021329 | -10.161113 | 7.139784 | -3.079086 | -10.312887 | 7.233801 | -2.800844 | -10.365176 | 7.564332 | NaN | NaN | NaN |
| 2024-02-04 | -2.822705 | -10.349005 | 7.526299 | -3.021329 | -10.161113 | 7.139784 | -3.079086 | -10.312887 | 7.233801 | -2.77569 | -9.883389 | 7.107699 |
Feature count checklist
This shows how the feature count grows when current and lagged features are used.
feature_count = len(base_features) * (1 + len(lags))
feature_count_table = pd.DataFrame({
"scope": ["one_region"],
"base_features": [len(base_features)],
"lags": [len(lags)],
"current_plus_lag_versions": [1 + len(lags)],
"feature_count": [feature_count]
})
feature_count_path = OUTPUT_DIR / "feature_count_checklist.csv"
feature_count_table.to_csv(feature_count_path, index=False)
display(feature_count_table)
| scope | base_features | lags | current_plus_lag_versions | feature_count | |
|---|---|---|---|---|---|
| 0 | one_region | 3 | 3 | 4 | 12 |