Aller au contenu

Référence API complète

Cette page liste les modules documentés par docstrings. Pour apprendre le package ou choisir les fonctions principales, commencer par API recommandée.

Les fonctions plus fines servent surtout aux notebooks de production, aux contrôles table par table et au débogage.

Pipeline recommandé

xyt_gps.pipeline

Recommended end-to-end entry point for the mobility pipeline.

MobilityPipelineResult dataclass

Named result returned by run_mobility_pipeline.

Attributes:

Name Type Description
raw RawGpsData

Raw GPS tables after loading or validation.

dataset MobilityDataset

Structured mobility tables.

indicators IndicatorResult | None

Optional mobility indicators. None when compute_indicators=False.

__iter__()

Allow raw, dataset, indicators = run_mobility_pipeline(...).

run_mobility_pipeline(config, *, raw=None, sample=None, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, validate=True, must_exist=True, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, compute_indicators=True, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, use_weights=True, default_phase_name='All')

Run the recommended single-source GPS-to-indicators workflow.

This is the simplest entry point for analysts discovering the package: load raw GPS tables, prepare mobility tables and optionally compute the generic indicators. Pass raw when tables have already been loaded in a notebook; otherwise the function calls load_gps_export(config).

Parameters:

Name Type Description Default
config ProjectConfig

Project configuration.

required
raw RawGpsData | None

Optional preloaded raw GPS tables. When omitted, files are loaded from config with load_gps_export.

None
sample RawSampleConfig | None

Optional sampling strategy used only when raw is omitted.

None
sociodemo DataFrame | None

Optional user-level sociodemographic table.

None
weights DataFrame | None

Optional user-level weighting table.

None
weight_col str

Name of the user weight column.

'weight'
default_weight float

Default user weight when no weight is available.

1.0
validate bool

Whether to validate raw tables during loading.

True
must_exist bool

Whether expected files must exist when loading from disk.

True
resample_missing_days bool

Add explicit Resampled_stay rows for missing tracked days.

False
clean_leg_geometries bool

Normalize continuous leg geometries.

True
add_length_outlier_flags bool

Add per-mode length outlier flags.

True
add_signal_quality_flags bool

Compute GPS signal-loss quality columns.

True
compute_indicators bool

Whether to compute IndicatorResult.

True
mode_col str

Leg mode column used for indicators.

'mode_niv1'
trips_mode_col str | None

Optional trip mode column override.

None
distance_col str | None

Optional leg distance column override.

None
include_zero_days bool

Include tracked days without movement in daily means.

True
include_excursions bool

Include legs/trips flagged as excursions.

True
include_airplane bool

Include airplane legs/trips in indicators. Defaults to false because airplane rows can dominate distances and CO2.

False
use_weights bool

Use user weights in population indicators.

True
default_phase_name str

Period label when no phase split exists.

'All'

Returns:

Type Description
MobilityPipelineResult

MobilityPipelineResult containing raw tables, mobility tables and

MobilityPipelineResult

optional indicators.

Configuration

xyt_gps.config

Configuration objects for project-specific GPS transformations.

Phase dataclass

Named experimental phase used to tag and filter mobility records.

Parameters:

Name Type Description Default
name str

Stable phase name used in output columns and reports.

required
start str | Timestamp

Inclusive phase start date.

required
end str | Timestamp

Inclusive phase end date.

required

TimeSlice dataclass

Named daily time interval used for reusable temporal aggregation.

Parameters:

Name Type Description Default
name str

Stable output label, for example HPM, HC or HPS.

required
start str

Inclusive start time formatted as HH:MM.

required
end str

Exclusive end time formatted as HH:MM.

required

TrackingThresholds dataclass

Minimum tracking duration expected for analysis.

Parameters:

Name Type Description Default
min_days_by_phase Mapping[str, int]

Minimum number of tracked days required by phase name. Missing phases default to one day in the filtering helpers.

dict()
min_total_tracked_days int

Minimum active tracking days across the full observation period.

7
round_to_full_weeks bool

If true, phase durations are rounded down to complete weeks after applying the minimum-day threshold.

True

SignalLossThreshold dataclass

Mode-specific signal-loss thresholds.

max_gap_m is the absolute largest distance allowed between two consecutive points of a leg geometry, in meters. max_relative_gap is the same gap divided by total leg length.

SpatialQualityThresholds dataclass

Spatial quality thresholds that may vary by project.

Parameters:

Name Type Description Default
max_consecutive_point_distance_m float | None

Reserved threshold for future point-jump filters.

None
max_relative_signal_loss float | None

Reserved global threshold for future simple signal-loss filters.

None
outlier_quantiles_by_mode tuple[float, ...]

Quantiles used to flag unusually long legs within each mode.

(0.98, 0.99)
signal_loss_thresholds_by_level Mapping[int, Mapping[str, SignalLossThreshold]]

Mode-specific thresholds used to create low_quality_legs_* flags.

default_signal_loss_thresholds()
bad_signal_user_quantile float

Quantile used to identify users with very high average signal loss.

0.995
signal_loss_mode_column str

Column used to match mode-specific signal thresholds.

'mode'

MatchingThresholds dataclass

Temporal matching and future map-matching parameters.

Parameters:

Name Type Description Default
leg_trip_journey_tolerance str

Maximum temporal tolerance used when matching legs, trips and journeys.

'5s'
osrm_max_points_per_chunk int

Reserved chunk size for future OSRM requests.

99
google_directions_fallback bool

Reserved flag for a future Google Directions fallback.

False

ProjectConfig dataclass

Project parameters that should remain explicit in the workflow.

ProjectConfig centralizes import paths, project names, periods, coordinate systems, phases, thresholds and mappings. The goal is to avoid hiding methodological choices inside large transformation functions.

Parameters:

Name Type Description Default
experiment_name str | None

Optional analytical experiment name, for example declic-prefig or declic-ziplo. Leave empty for generic GPS processing without experiment grouping.

None
motiontag_project_name str | None

Optional provider project name used in export file names. Required only when loading files by inferred provider names.

None
period str | None

Optional export period string used in provider file names. Required only when loading files by inferred provider names.

None
raw_data_dir str | Path

Directory containing raw GPS CSV exports.

'.'
export_dir str | Path | None

Optional output directory.

None
target_crs str

CRS attached to parsed GPS geometries.

'EPSG:4326'
operations_crs str

Metric CRS used for distance calculations.

'EPSG:2056'
csv_sep str

CSV separator used by GPS exports.

';'
timezone str

Local timezone label.

'Europe/Zurich'
phases tuple[Phase, ...]

Optional analytical phases used for phase assignment and tracking filters. Leave empty for analyses without phase split.

()
tracking_thresholds TrackingThresholds

User-level tracking-quality thresholds.

TrackingThresholds()
spatial_quality_thresholds SpatialQualityThresholds

Leg and user GPS-quality thresholds.

SpatialQualityThresholds()
matching_thresholds MatchingThresholds

Temporal matching thresholds.

MatchingThresholds()
mappings MobilityMappings

Mode and purpose mappings.

MobilityMappings()
time_slices tuple[TimeSlice, ...]

Daily time slices used for reusable temporal aggregations. Values outside the configured intervals are labelled HC by default.

default_time_slices()
reference_year int | None

Optional year used to derive age from sociodemographic data.

None
storyline_prefix_candidates tuple[str, ...]

Accepted storyline file prefixes.

('StorylineWithTripId', 'StorylineWithUserAnnotations')

default_time_slices()

Return default mobility-dashboard time slices.

HC is intentionally not defined as an interval: it is the fallback label for observations outside the configured peak periods.

default_signal_loss_thresholds()

Return the default signal-loss thresholds.

These values reproduce the current spatial-quality convention. Some absolute thresholds are intentionally very low in that notebook, making the relative signal-loss threshold the effective criterion. They should remain configurable by project.

Import

xyt_gps.io

Input helpers for structured GPS exports and project side tables.

GpsExportPaths dataclass

Resolved file paths for one structured GPS export period.

Attributes:

Name Type Description
storyline Path

Path to the storyline CSV.

trips Path

Path to the trips CSV.

journeys Path

Path to the journeys CSV.

user_statistics Path | None

Optional path to the user statistics CSV.

RawSampleConfig dataclass

Sampling options for loading large raw GPS exports.

Use RawSampleConfig.by_users(n) when checking a large export: it keeps all rows for a small number of selected users and preserves the relation between storyline, trips, journeys and user statistics. Use RawSampleConfig.random_rows(n) only for quick schema inspection.

Attributes:

Name Type Description
mode str

Sampling strategy, either users or rows.

n int

Number of users or rows to keep.

random_state int | None

Optional random seed.

chunksize int

CSV chunk size for user-based sampling.

user_id_column str

User identifier column in storyline, trips and journeys.

user_statistics_id_column str

User identifier column in user statistics.

storyline_type_column str

Column used to identify track rows.

track_type_value str

Value used for track rows in storyline.

by_users(n, *, random_state=42, chunksize=100000, user_id_column='user_id') classmethod

Keep all raw rows for n randomly selected users.

random_rows(n, *, random_state=42, chunksize=100000, user_id_column='user_id') classmethod

Keep n random rows per table; storyline rows prefer Track rows.

infer_gps_export_paths(config, *, must_exist=True)

Infer expected raw file paths from project parameters.

Parameters:

Name Type Description Default
config ProjectConfig

Project configuration containing raw_data_dir, motiontag_project_name, period and accepted storyline prefixes.

required
must_exist bool

When true, raise an error if any expected file is absent.

True

Returns:

Type Description
GpsExportPaths

A GpsExportPaths object with resolved CSV paths.

Raises:

Type Description
FileNotFoundError

If must_exist is true and at least one expected CSV file is missing.

source_id_for_config(config)

Build a stable source id for multi-project imports.

The source id is used to namespace identifiers when several projects or periods are loaded in one pass.

sample_raw_gps_data(raw, sample)

Apply a raw-data sample before validation or transformation.

Parameters:

Name Type Description Default
raw RawGpsData

Raw GPS tables already loaded in memory.

required
sample RawSampleConfig | None

Sampling configuration. If None, the raw object is returned unchanged.

required

Returns:

Type Description
RawGpsData

A new RawGpsData object containing sampled tables.

Raises:

Type Description
KeyError

If user-based sampling is requested and the user id column is missing.

ValueError

If the sampling mode is unsupported.

load_gps_export(config, *, sample=None, validate=True, must_exist=True)

Load raw GPS CSV exports without transforming their schema.

Parameters:

Name Type Description Default
config ProjectConfig

Project configuration used to infer file paths and CSV separator.

required
sample RawSampleConfig | None

Optional sampling strategy. RawSampleConfig.by_users reads CSV files by chunks and keeps all records for selected users.

None
validate bool

When true, attach schema-validation reports to the returned object.

True
must_exist bool

When true, fail if expected CSV files are missing.

True

Returns:

Type Description
RawGpsData

Raw storyline, trips, journeys and optional user-statistics tables.

Raises:

Type Description
FileNotFoundError

If required files are missing.

KeyError

If user-based sampling references a missing user column.

load_gps_source(config, *, source_id=None, sample=None, validate=True, must_exist=True, namespace_ids=True)

Load one GPS source and add source metadata columns.

Parameters:

Name Type Description Default
config ProjectConfig

Project configuration for one project or period.

required
source_id str | None

Optional source identifier. If absent, it is built from the configuration.

None
sample RawSampleConfig | None

Optional raw sampling strategy.

None
validate bool

Whether to attach validation reports.

True
must_exist bool

Whether expected files must exist.

True
namespace_ids bool

When true, prefix ids with source_id to avoid collisions in multi-source imports.

True

Returns:

Type Description
RawGpsData

Raw GPS tables with xyt_source_id, project metadata and

RawGpsData

optional raw_* identifier columns.

concat_raw_gps_data(raws, *, validate=True)

Concatenate several already-loaded raw GPS datasets.

Parameters:

Name Type Description Default
raws Iterable[RawGpsData]

Raw datasets to concatenate. They should already contain source metadata if they come from different projects or periods.

required
validate bool

Whether to validate the concatenated raw tables.

True

Returns:

Type Description
RawGpsData

One RawGpsData object containing all rows.

Raises:

Type Description
ValueError

If raws is empty.

load_gps_sources(configs, *, sample=None, validate=True, must_exist=True, namespace_ids=True)

Load and concatenate raw GPS exports from several sources.

Parameters:

Name Type Description Default
configs Iterable[ProjectConfig]

Project configurations, one per project or period.

required
sample RawSampleConfig | None

Optional sampling strategy applied to each source.

None
validate bool

Whether to attach validation reports.

True
must_exist bool

Whether expected files must exist.

True
namespace_ids bool

Whether to prefix identifiers by source id.

True

Returns:

Type Description
RawGpsData

A concatenated RawGpsData object ready for transformation.

load_sociodemo(path, *, user_id_column='Id')

Load a sociodemographic side table and standardize the user id column.

Parameters:

Name Type Description Default
path str | Path

CSV or Excel file path.

required
user_id_column str

Column to rename to user_id.

'Id'

Returns:

Type Description
DataFrame

A pandas table that can be passed to prepare_mobility_dataset.

Données testset

xyt_gps.sample_data

Demo sample-data helpers.

The default sample is a personal GPS storyline explicitly authorized for this package demo. It is loaded only when requested.

find_sample_gps_path(start=None)

Find the authorized demo pickle from the current repo layout.

load_sample_gps(path=None, *, user_id='sample_user', max_rows=None, validate=True)

Load the authorized personal demo storyline as a structured GPS raw dataset.

The source pickle contains a storyline table only. The function pseudonymizes user_id by default and derives minimal Trips, Journeys and UserStatistics tables so that tutorial transformations can run without a full raw GPS export.

Données synthétiques

xyt_gps.synthetic

Synthetic GPS testset generation from the authorized sample.

The generator is intentionally explicit and parameter-driven. With one authorized source user, the package cannot learn a defensible population model; it can, however, bootstrap realistic structured GPS days and inject controlled tracking anomalies for testing the transformation pipeline.

SyntheticExperiment dataclass

Experiment window used by the synthetic Declic generator.

Parameters:

Name Type Description Default
experiment_name str

Stable experiment identifier added to all generated tables, for example declic-mobility-vague1.

required
phases tuple[Phase, ...]

Optional experiment phases. Empty phases are allowed for a generic GPS testset without analytical phase split.

()
motiontag_project_name str | None

Optional provider project name. Defaults to experiment_name.

None

SyntheticAnomalyRates dataclass

Controlled anomaly rates injected into synthetic raw tables.

Parameters:

Name Type Description Default
missing_geometry_rate float

Share of storyline rows with a missing geometry. Keep this below the raw schema tolerance when the generated dataset must pass prepare_mobility_dataset() without manual cleaning.

0.002
unconfirmed_rate float

Share of rows where confirmation timestamps are cleared.

0.2
mode_mismatch_rate float

Share of track rows where detected_mode differs from the final mode.

0.06
extreme_length_rate float

Share of track rows with inflated length values.

0.01

SyntheticGpsDataset dataclass

Generated structured GPS testset and companion construction tables.

Attributes:

Name Type Description
raw RawGpsData

Raw GPS-like tables ready for validation or transformation.

user_presence DataFrame

User-level construction table with experiment and phase windows.

generation_manifest DataFrame

Summary of generated rows, users and active days.

tables()

Return all generated tables keyed by export name.

default_declic_synthetic_experiments()

Return the five Declic experiment windows used for synthetic tests.

generate_synthetic_declic_gps(*, sample_path=None, experiments=None, users_per_experiment=50, random_state=42, anomaly_rates=None, validate=True)

Generate a Declic-like synthetic GPS dataset from the sample.

The method uses bootstrap resampling of observed sample days, shifts them to the requested experiment phases, perturbs geometries and injects controlled anomalies. It is suited for tests, tutorials and pipeline validation. It is not a trained behavioral model.

Parameters:

Name Type Description Default
sample_path str | Path | None

Optional path to the authorized sample pickle. If omitted, the package searches the repository layout.

None
experiments Iterable[SyntheticExperiment] | None

Experiment definitions. Defaults to prefiguration, waves 1-3 and ZIPLO.

None
users_per_experiment int

Number of synthetic users generated for each experiment.

50
random_state int | None

Seed used for reproducible generation.

42
anomaly_rates SyntheticAnomalyRates | None

Rates for missing geometry, unconfirmed rows, mode mismatch and extreme lengths.

None
validate bool

If true, attach raw schema validation reports to the output.

True

Returns:

Type Description
SyntheticGpsDataset

A SyntheticGpsDataset containing raw GPS-like tables,

SyntheticGpsDataset

user_presence and a generation manifest.

write_synthetic_gps_dataset(dataset, output_dir, *, formats=('parquet',), overwrite=True)

Write a generated synthetic GPS dataset as landing-ready files.

Parameters:

Name Type Description Default
dataset SyntheticGpsDataset

Synthetic dataset produced by generate_synthetic_declic_gps().

required
output_dir str | Path

Destination directory.

required
formats Iterable[str]

Formats for large event tables. Supported values are csv and parquet. Defaults to Parquet to avoid very large CSV files. Construction tables are always written as CSV.

('parquet',)
overwrite bool

If false, fail when a target file already exists.

True

Returns:

Type Description
DataFrame

A manifest with one row per written file.

Schémas

xyt_gps.schema

Schema validation for raw GPS mobility exports.

SchemaSpec dataclass

Column contract for one raw GPS input table.

expected_gps_schema()

Return the expected GPS import schema as an inspectable table.

The current schema describes the structured GPS multi-table contract used by the package: storyline, trips, journeys and user statistics. It is intentionally generic at the public API level because the landing step may adapt source-specific files and column names before data loading.

check_raw_import_columns(storyline, user_statistics=None, *, trips=None, journeys=None, include_recommended=True, raise_on_error=False)

Check the raw column structure before building RawGpsData.

This is a lightweight, notebook-friendly check. It focuses on the column names expected at the raw import stage. It does not parse dates, geometries or modes.

Parameters:

Name Type Description Default
storyline DataFrame

Raw storyline table to check.

required
user_statistics DataFrame | None

Optional raw user statistics table to check.

None
trips DataFrame | None

Optional raw trips table to check.

None
journeys DataFrame | None

Optional raw journeys table to check.

None
include_recommended bool

Include non-blocking recommended columns in the report.

True
raise_on_error bool

Raise a ValueError when a required column is missing.

False

Returns:

Type Description
DataFrame

A DataFrame with one row per expected column. Important columns are:

DataFrame

table, expected_column, present, status and message.

validate_schema(df, spec, *, allow_extra_columns=True)

Validate column presence and basic null rates without changing data.

validate_gps_raw(raw)

Validate all raw GPS tables that are present.

Transformations

xyt_gps.transform

Orchestration from structured GPS exports to mobility tables.

prepare_mobility_dataset(raw, config, *, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validation=None)

Run the transparent preparation workflow.

The function orchestrates validation, storyline parsing, trip and journey preparation, mappings, split into staypoints and legs, GPS quality flags, user tracking stats and relation tables. Each step is also exposed as a smaller public function so notebooks can inspect intermediate states.

Parameters:

Name Type Description Default
raw RawGpsData

Raw GPS tables loaded with load_gps_export or load_gps_sources.

required
config ProjectConfig

Project configuration controlling mappings, phases, thresholds and CRS.

required
sociodemo DataFrame | None

Optional user-level side table with a user_id column.

None
weights DataFrame | None

Optional user-level weighting table with user_id and weight_col.

None
weight_col str

Name of the weight column to merge/create.

'weight'
default_weight float

Weight used when no weighting table is provided, or when a user has no matching weight.

1.0
resample_missing_days bool

When true, insert transparent Resampled_stay rows for missing user-days.

False
clean_leg_geometries bool

When true, convert continuous MultiLineString legs to LineString and drop discontinuous geometries.

True
add_length_outlier_flags bool

When true, add per-mode length quantile flags to legs.

True
add_signal_quality_flags bool

When true, compute GPS signal-loss metrics on legs and user-level signal-quality flags.

True
validation dict[str, SchemaValidationResult] | None

Optional validation reports. If absent, reports attached to raw are reused or recomputed.

None

Returns:

Type Description
MobilityDataset

A MobilityDataset containing parsed storyline, staypoints, legs,

MobilityDataset

trips, journeys, user stats and mapping tables.

Raises:

Type Description
ValueError

If raw schema validation contains blocking errors.

concat_mobility_datasets(datasets)

Concatenate transformed mobility datasets from several sources.

Parameters:

Name Type Description Default
datasets Iterable[MobilityDataset]

Already transformed datasets, usually one per project or period.

required

Returns:

Type Description
MobilityDataset

One MobilityDataset with concatenated tables and namespaced

MobilityDataset

validation reports.

Raises:

Type Description
ValueError

If no dataset is provided.

prepare_mobility_datasets(configs, *, sociodemo_by_source=None, weights_by_source=None, weight_col='weight', default_weight=1.0, sample=None, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validate=True, must_exist=True, namespace_ids=True)

Load, transform and concatenate several GPS sources or periods.

Parameters:

Name Type Description Default
configs Iterable[ProjectConfig]

Project configurations, one per project or period.

required
sociodemo_by_source DataFrame | Mapping[str, DataFrame] | None

Optional sociodemographic table or mapping from source id/project name to a sociodemographic table.

None
weights_by_source DataFrame | Mapping[str, DataFrame] | None

Optional weighting table or mapping from source id/project name to a user-level weighting table.

None
weight_col str

Name of the weight column to merge/create.

'weight'
default_weight float

Weight used when no weighting table is provided, or when a user has no matching weight.

1.0
sample RawSampleConfig | None

Optional raw sampling strategy applied to each source.

None
resample_missing_days bool

Whether to add transparent missing-day stays.

False
clean_leg_geometries bool

Whether to normalize/drop problematic leg geometries before length and quality calculations.

True
add_length_outlier_flags bool

Whether to add per-mode length outlier flags.

True
add_signal_quality_flags bool

Whether to compute GPS signal-loss flags.

True
validate bool

Whether to validate raw exports before transformation.

True
must_exist bool

Whether expected raw CSV files must exist.

True
namespace_ids bool

Whether to prefix ids by source id before concatenation.

True

Returns:

Type Description
MobilityDataset

A concatenated MobilityDataset.

Parsing

xyt_gps.parsing

Parsing helpers for structured GPS exports.

drop_nans_if_low_rate(df, column, *, threshold=0.01)

Drop nulls only when the null rate is explicitly below the threshold.

parse_ewkb(value)

Parse an EWKB hex string into a shapely geometry.

parse_date_columns(df, columns, *, utc=True)

Parse existing date columns and leave absent columns untouched.

assign_phase(value, phases, *, default='Other')

Assign a date or timestamp to the configured experimental phase.

Préparation des tables

xyt_gps.prepare_tables

Prepare raw GPS tables before mobility object construction.

apply_storyline_mappings(storyline, mappings=None)

Add purpose and mode aggregation columns to storyline rows.

apply_trip_journey_mappings(trips, journeys, mappings=None)

Add purpose and mode aggregation columns to trips and journeys.

prepare_storyline(storyline, config, *, drop_nan_threshold=0.01)

Validate, parse geometry/dates, assign phases and map modes/purposes.

prepare_trips(trips, config)

Validate and parse raw trips.

prepare_journeys(journeys, config)

Validate and parse raw journeys.

Tables de mobilité

xyt_gps.mobility_tables

Small mobility-table transformations used by the preparation workflow.

split_storyline(storyline)

Split a parsed storyline into staypoints and legs.

add_user_id_day(legs)

Add the person-day id used by downstream indicators and notebooks.

add_length_quantile_flags(legs, *, group_col='mode', length_col='length', quantiles=(0.98, 0.99))

Flag unusually long legs within each mode group.

Relations entre tables

xyt_gps.relations

Relation tables linking legs, staypoints, trips and journeys.

build_track_trip_journey_map(legs, trips, journeys, *, tolerance='5s')

Map legs to trips and journeys.

If legs already contain a valid trip_id, the function uses it directly. Otherwise it matches each leg midpoint to trips from the same user, then matches each trip midpoint to journeys from the same user. Temporal joins are vectorized by user to avoid per-leg nested loops.

build_legs_staypoints_map(legs, staypoints, *, tolerance='5s')

Map each staypoint to previous and next legs using near-identical timestamps.

add_journey_to_trips(trips, journeys, mapping)

Add journey_id and journey purpose to trips.

add_trip_destination_activity(trips, map_track_trip_journey, map_legs_staypoints)

Add leading activity id to trips using the last leg of each trip.

add_excursion_flags_to_trips_journeys(trips, journeys, legs, map_track_trip_journey, *, excursion_col='excursion')

Propagate leg-level excursion flags to trips and journeys.

Statistiques utilisateurs

xyt_gps.user_stats

User-level statistics derived from prepared GPS tables.

add_excursion_stats_to_user_stats(user_stats, legs, *, excursion_col='excursion')

Merge user-level excursion counts from leg-level flags.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table containing user_id.

required
legs DataFrame

Leg table containing user_id and an excursion flag column.

required
excursion_col str

Name of the leg-level excursion flag.

'excursion'

Returns:

Type Description
DataFrame

A copy of user_stats with excursion_legs_count,

DataFrame

total_legs_count and excursion_leg_ratio.

build_user_stats(storyline, config, *, user_statistics=None, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0)

Build user-level tracking stats and merge optional side tables.

The output keeps the GPS database readable at user level. When the storyline contains artificial Resampled_stay rows, the table reports both the continuous record (active_days_count) and the observed tracking days before resampling (observed_active_days_count). If no weights are provided, weight_col is set to default_weight.

Qualité du suivi

xyt_gps.quality

Quality public facade for the data preparation workflow.

build_daily_tracking_presence(storyline, config=None, *, user_id_column='user_id', date_column='started_at', exclude_resampled=True)

Build one observed tracking row per user and date.

This table is intentionally based on observed rows before temporal resampling. Otherwise, artificial Resampled_stay rows would make missing days look like tracked days in participation analyses.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
config ProjectConfig | None

Optional project configuration used to assign phases.

None
user_id_column str

User identifier column.

'user_id'
date_column str

Datetime column used to define active tracking days.

'started_at'
exclude_resampled bool

When true, rows where type == "Resampled_stay" are ignored.

True

Returns:

Type Description
DataFrame

A table with user_id, tracking_date, active_day and, when a

DataFrame

config with phases is provided, Phase.

Raises:

Type Description
KeyError

If the requested user or date column is missing.

build_weekly_participation_grid(storyline, config, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')

Count observed active tracking days by user and protocol week.

When phases are configured, weeks are relative to these phases rather than ISO calendar weeks. Without configured phases, the grid uses one analytical period named default_period_name and spans the observed tracking dates.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table, preferably before resampling.

required
config ProjectConfig

Project configuration containing phase dates.

required
user_ids Iterable | None

Optional full list of expected users. If omitted, users are taken from storyline.

None
user_id_column str

User identifier column in storyline.

'user_id'
date_column str

Datetime column used to define active tracking days.

'started_at'
exclude_resampled bool

When true, artificial Resampled_stay rows are ignored.

True
default_period_name str

Name used in the output when no phase is configured.

'All'

Returns:

Type Description
DataFrame

A complete user x phase-week table. active_days_count and

DataFrame

participation_score are integers from 0 to 7.

Raises:

Type Description
ValueError

If no phase is configured and no tracking date is available to infer a default analysis period.

calculate_user_tracking_stats(storyline)

Calculate observed tracking windows and missing days per user.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table. It must contain user_id and started_at. The function expects one row per stay or track segment.

required

Returns:

Type Description
DataFrame

A user-level table with first and latest tracked dates, the number

DataFrame

of active days, inactive days, maximum gap between tracked days and

DataFrame

tracking completeness.

Raises:

Type Description
KeyError

If user_id or started_at is missing.

summarize_participation_grid(participation_grid, *, good_week_min_days=5)

Summarize weekly participation coverage.

Parameters:

Name Type Description Default
participation_grid DataFrame

Table produced by build_weekly_participation_grid.

required
good_week_min_days int

Minimum active days for a week to be considered good or complete.

5

Returns:

Type Description
DataFrame

One summary row for all phases and one row per phase.

build_tracking_gap_report(storyline, config=None, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')

Summarize observed, missing and consecutive tracking days.

This is the package version of the useful generic part of the historical tracking-quality notebook: one row per user and analytical period, with active days, missing days, maximum gap and maximum consecutive observed days.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
config ProjectConfig | None

Optional project configuration. When phases are configured, the report is computed by phase; otherwise it uses the observed period named default_period_name.

None
user_ids Iterable | None

Optional full list of expected users.

None
user_id_column str

User identifier column in storyline.

'user_id'
date_column str

Datetime column used to define active tracking days.

'started_at'
exclude_resampled bool

Whether to ignore artificial Resampled_stay rows.

True
default_period_name str

Period name used when no phase is configured.

'All'

Returns:

Type Description
DataFrame

A user-period table with tracking coverage and gap metrics.

build_tracking_quality_report(user_stats)

Build a compact one-row report for user tracking quality.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table with tracking-quality columns.

required

Returns:

Type Description
DataFrame

A one-row table with user counts, valid-user share and median

DataFrame

tracking-duration indicators. The output is meant for quick notebook

DataFrame

checks before applying filters.

calculate_tracking_periods(user_stats, config, *, min_days_by_phase=None)

Calculate effective tracked days per configured phase.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level tracking table, usually produced by calculate_user_tracking_stats.

required
config ProjectConfig

Project configuration containing Phase definitions and tracking thresholds.

required
min_days_by_phase Mapping[str, int] | None

Optional override for minimum tracked days by phase name.

None

Returns:

Type Description
DataFrame

A copy of user_stats with n_days_phase_*, phase*_start and

DataFrame

phase*_end columns. If no phase is configured, the input schema is

DataFrame

preserved.

flag_tracking_quality(user_stats, config)

Add transparent tracking-quality flags to a user table.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table with at least active_days_count.

required
config ProjectConfig

Project configuration containing total and phase-level tracking thresholds.

required

Returns:

Type Description
DataFrame

A copy of user_stats with boolean flags such as

DataFrame

tracked_days_ok, phase*_tracked_days_ok, tracking_quality_ok

DataFrame

and the categorical reason tracking_quality_reason.

Raises:

Type Description
KeyError

If active_days_count is missing.

summarize_phase_tracking(user_stats, config, *, day_thresholds=(7, 14, 21))

Summarize user tracking coverage by configured phase.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table containing n_days_phase_1, n_days_phase_2, etc.

required
config ProjectConfig

Project configuration containing phase dates.

required
day_thresholds Iterable[int]

Day-count thresholds to report, for example 7, 14 and 21 days.

(7, 14, 21)

Returns:

Type Description
DataFrame

One row per phase with user counts above each threshold and mean

DataFrame

coverage over the theoretical phase duration.

build_mode_detection_precision(storyline, *, mode_col='mode', detected_mode_col='detected_mode', type_col='type', confirmed_at_col='confirmed_at', confirmed_only=True, group_cols=('mode', 'mode_niv1', 'mode_mrmt'))

Compare detected and confirmed transport modes.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
mode_col str

Confirmed or corrected mode column.

'mode'
detected_mode_col str

Detected mode column.

'detected_mode'
type_col str

Column used to keep only track rows.

'type'
confirmed_at_col str

Confirmation timestamp column.

'confirmed_at'
confirmed_only bool

When true, keep only rows with a confirmation timestamp when the column exists.

True
group_cols Iterable[str]

Mode columns to summarize when present.

('mode', 'mode_niv1', 'mode_mrmt')

Returns:

Type Description
DataFrame

A long table with one precision row per available grouping column and

DataFrame

label.

build_user_confirmation_rates(storyline, *, user_id_col='user_id', type_col='type', confirmed_at_col='confirmed_at')

Calculate user-level confirmation rates for stays and tracks.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
user_id_col str

User identifier column.

'user_id'
type_col str

Column distinguishing Stay and Track.

'type'
confirmed_at_col str

Confirmation timestamp column.

'confirmed_at'

Returns:

Type Description
DataFrame

One row per user with stay and track confirmation counts and rates.

get_extreme_legs_by_mode(legs, *, mode_col='mode_niv1', length_col='length', duration_col='duration', top_n=5)

Return the longest legs within each mode.

Parameters:

Name Type Description Default
legs DataFrame

Leg table.

required
mode_col str

Mode column used for grouping.

'mode_niv1'
length_col str

Length column, expected in meters.

'length'
duration_col str

Optional duration column in seconds.

'duration'
top_n int

Number of longest legs retained per mode.

5

Returns:

Type Description
DataFrame

A table sorted by mode and descending distance, with distance in

DataFrame

kilometers and speed when duration is available.

summarize_leg_lengths_by_mode(legs, *, mode_col='mode_niv2', length_col='length', quantiles=(0.95, 0.98, 0.99))

Summarize leg-distance distributions by mode.

Parameters:

Name Type Description Default
legs DataFrame

Leg table.

required
mode_col str

Mode column used for grouping.

'mode_niv2'
length_col str

Length column, expected in meters.

'length'
quantiles Iterable[float]

Quantiles to add to the summary.

(0.95, 0.98, 0.99)

Returns:

Type Description
DataFrame

One row per mode with distances expressed in kilometers.

resample_missing_stays(storyline, config)

Create transparent placeholder stays for missing tracking dates.

The generated rows are labelled Resampled_stay. They are useful when the analysis requires a continuous user calendar, but they remain identifiable through their type and comment_feedback values.

Parameters:

Name Type Description Default
storyline GeoDataFrame

Parsed storyline table with user_id, started_at and geometry columns.

required
config ProjectConfig

Project configuration used to assign experimental phases to inserted days.

required

Returns:

Type Description
GeoDataFrame

A tuple (storyline, missing_days) where storyline includes the

DataFrame

optional placeholder rows and missing_days lists one row per

tuple[GeoDataFrame, DataFrame]

inserted user-day.

build_user_selection_table(user_stats, *, require_tracking_quality=False, exclude_bad_signal_users=True, max_low_quality_legs_share=None)

Build an explicit user-level table for selecting analysis users.

The function does not remove rows. It adds one boolean column per rule, a final analysis_user_ok flag and an analysis_user_reason column. This keeps exclusions inspectable before any table is filtered. The name deliberately uses selection_table rather than filter_matrix because the object is meant to be read by humans before being applied.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table produced by prepare_mobility_dataset. Expected columns depend on enabled rules: tracking_quality_ok, bad_signal_user and low_quality_legs_share.

required
require_tracking_quality bool

When true, keep only users with tracking_quality_ok == True. Leave false when tracking quality is used as a diagnostic rather than an exclusion rule, for example when the current dump does not cover all configured phases.

False
exclude_bad_signal_users bool

When true, exclude users flagged by the signal-quality step.

True
max_low_quality_legs_share float | None

Optional maximum share of level-1 low-quality legs tolerated per user.

None

Returns:

Type Description
DataFrame

A copy of user_stats with *_filter_ok, analysis_user_ok and

DataFrame

analysis_user_reason columns.

Raises:

Type Description
KeyError

If a column required by an enabled rule is missing.

ValueError

If max_low_quality_legs_share is outside [0, 1].

filter_mobility_dataset_by_users(dataset, user_ids)

Filter every user-indexed table of a MobilityDataset.

Mapping tables are filtered after the core tables so they only reference retained legs, trips, journeys and staypoints. The function does not recompute indicators; it only preserves relational consistency after a user selection.

Parameters:

Name Type Description Default
dataset MobilityDataset

Transformed mobility dataset.

required
user_ids Iterable

Iterable of user identifiers to keep.

required

Returns:

Type Description
MobilityDataset

A new MobilityDataset with filtered tables and unchanged validation

MobilityDataset

reports.

filter_table_by_users(df, user_ids, *, user_id_column='user_id')

Filter any mobility table by selected users while keeping its schema.

Parameters:

Name Type Description Default
df DataFrame

Table to filter. If user_id_column is absent, the table is returned unchanged.

required
user_ids Iterable

Iterable of user identifiers to keep.

required
user_id_column str

Name of the user identifier column in df.

'user_id'

Returns:

Type Description
DataFrame

A copy of df restricted to selected users, with the index reset.

select_analysis_users(selection_table, *, quality_column='analysis_user_ok')

Return user ids selected by a user selection table.

Parameters:

Name Type Description Default
selection_table DataFrame

Table produced by build_user_selection_table.

required
quality_column str

Boolean column used as the final selection flag.

'analysis_user_ok'

Returns:

Type Description
Index

User ids for which quality_column is true.

Raises:

Type Description
KeyError

If user_id or quality_column is missing.

select_valid_tracking_users(user_stats, *, quality_column='tracking_quality_ok')

Return user ids that pass one quality column.

This helper is intentionally narrow and remains useful for quick checks. For analysis filters that combine tracking quality and GPS signal flags, prefer build_user_selection_table followed by select_analysis_users.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table containing user_id and the requested quality column.

required
quality_column str

Boolean column used as the selection criterion.

'tracking_quality_ok'

Returns:

Type Description
Index

User ids for which quality_column is true.

Raises:

Type Description
KeyError

If the requested quality column is missing.

Présence et participation

xyt_gps.quality_tracking

Daily presence and weekly participation helpers.

calculate_user_tracking_stats(storyline)

Calculate observed tracking windows and missing days per user.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table. It must contain user_id and started_at. The function expects one row per stay or track segment.

required

Returns:

Type Description
DataFrame

A user-level table with first and latest tracked dates, the number

DataFrame

of active days, inactive days, maximum gap between tracked days and

DataFrame

tracking completeness.

Raises:

Type Description
KeyError

If user_id or started_at is missing.

build_daily_tracking_presence(storyline, config=None, *, user_id_column='user_id', date_column='started_at', exclude_resampled=True)

Build one observed tracking row per user and date.

This table is intentionally based on observed rows before temporal resampling. Otherwise, artificial Resampled_stay rows would make missing days look like tracked days in participation analyses.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
config ProjectConfig | None

Optional project configuration used to assign phases.

None
user_id_column str

User identifier column.

'user_id'
date_column str

Datetime column used to define active tracking days.

'started_at'
exclude_resampled bool

When true, rows where type == "Resampled_stay" are ignored.

True

Returns:

Type Description
DataFrame

A table with user_id, tracking_date, active_day and, when a

DataFrame

config with phases is provided, Phase.

Raises:

Type Description
KeyError

If the requested user or date column is missing.

build_weekly_participation_grid(storyline, config, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')

Count observed active tracking days by user and protocol week.

When phases are configured, weeks are relative to these phases rather than ISO calendar weeks. Without configured phases, the grid uses one analytical period named default_period_name and spans the observed tracking dates.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table, preferably before resampling.

required
config ProjectConfig

Project configuration containing phase dates.

required
user_ids Iterable | None

Optional full list of expected users. If omitted, users are taken from storyline.

None
user_id_column str

User identifier column in storyline.

'user_id'
date_column str

Datetime column used to define active tracking days.

'started_at'
exclude_resampled bool

When true, artificial Resampled_stay rows are ignored.

True
default_period_name str

Name used in the output when no phase is configured.

'All'

Returns:

Type Description
DataFrame

A complete user x phase-week table. active_days_count and

DataFrame

participation_score are integers from 0 to 7.

Raises:

Type Description
ValueError

If no phase is configured and no tracking date is available to infer a default analysis period.

summarize_participation_grid(participation_grid, *, good_week_min_days=5)

Summarize weekly participation coverage.

Parameters:

Name Type Description Default
participation_grid DataFrame

Table produced by build_weekly_participation_grid.

required
good_week_min_days int

Minimum active days for a week to be considered good or complete.

5

Returns:

Type Description
DataFrame

One summary row for all phases and one row per phase.

Rapports qualité

xyt_gps.quality_reports

Tracking quality reports and user-level tracking flags.

build_tracking_gap_report(storyline, config=None, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')

Summarize observed, missing and consecutive tracking days.

This is the package version of the useful generic part of the historical tracking-quality notebook: one row per user and analytical period, with active days, missing days, maximum gap and maximum consecutive observed days.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
config ProjectConfig | None

Optional project configuration. When phases are configured, the report is computed by phase; otherwise it uses the observed period named default_period_name.

None
user_ids Iterable | None

Optional full list of expected users.

None
user_id_column str

User identifier column in storyline.

'user_id'
date_column str

Datetime column used to define active tracking days.

'started_at'
exclude_resampled bool

Whether to ignore artificial Resampled_stay rows.

True
default_period_name str

Period name used when no phase is configured.

'All'

Returns:

Type Description
DataFrame

A user-period table with tracking coverage and gap metrics.

summarize_phase_tracking(user_stats, config, *, day_thresholds=(7, 14, 21))

Summarize user tracking coverage by configured phase.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table containing n_days_phase_1, n_days_phase_2, etc.

required
config ProjectConfig

Project configuration containing phase dates.

required
day_thresholds Iterable[int]

Day-count thresholds to report, for example 7, 14 and 21 days.

(7, 14, 21)

Returns:

Type Description
DataFrame

One row per phase with user counts above each threshold and mean

DataFrame

coverage over the theoretical phase duration.

calculate_tracking_periods(user_stats, config, *, min_days_by_phase=None)

Calculate effective tracked days per configured phase.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level tracking table, usually produced by calculate_user_tracking_stats.

required
config ProjectConfig

Project configuration containing Phase definitions and tracking thresholds.

required
min_days_by_phase Mapping[str, int] | None

Optional override for minimum tracked days by phase name.

None

Returns:

Type Description
DataFrame

A copy of user_stats with n_days_phase_*, phase*_start and

DataFrame

phase*_end columns. If no phase is configured, the input schema is

DataFrame

preserved.

flag_tracking_quality(user_stats, config)

Add transparent tracking-quality flags to a user table.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table with at least active_days_count.

required
config ProjectConfig

Project configuration containing total and phase-level tracking thresholds.

required

Returns:

Type Description
DataFrame

A copy of user_stats with boolean flags such as

DataFrame

tracked_days_ok, phase*_tracked_days_ok, tracking_quality_ok

DataFrame

and the categorical reason tracking_quality_reason.

Raises:

Type Description
KeyError

If active_days_count is missing.

build_tracking_quality_report(user_stats)

Build a compact one-row report for user tracking quality.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table with tracking-quality columns.

required

Returns:

Type Description
DataFrame

A one-row table with user counts, valid-user share and median

DataFrame

tracking-duration indicators. The output is meant for quick notebook

DataFrame

checks before applying filters.

Diagnostics

xyt_gps.quality_diagnostics

Diagnostic reports for leg lengths, confirmations and mode detection.

summarize_leg_lengths_by_mode(legs, *, mode_col='mode_niv2', length_col='length', quantiles=(0.95, 0.98, 0.99))

Summarize leg-distance distributions by mode.

Parameters:

Name Type Description Default
legs DataFrame

Leg table.

required
mode_col str

Mode column used for grouping.

'mode_niv2'
length_col str

Length column, expected in meters.

'length'
quantiles Iterable[float]

Quantiles to add to the summary.

(0.95, 0.98, 0.99)

Returns:

Type Description
DataFrame

One row per mode with distances expressed in kilometers.

get_extreme_legs_by_mode(legs, *, mode_col='mode_niv1', length_col='length', duration_col='duration', top_n=5)

Return the longest legs within each mode.

Parameters:

Name Type Description Default
legs DataFrame

Leg table.

required
mode_col str

Mode column used for grouping.

'mode_niv1'
length_col str

Length column, expected in meters.

'length'
duration_col str

Optional duration column in seconds.

'duration'
top_n int

Number of longest legs retained per mode.

5

Returns:

Type Description
DataFrame

A table sorted by mode and descending distance, with distance in

DataFrame

kilometers and speed when duration is available.

build_user_confirmation_rates(storyline, *, user_id_col='user_id', type_col='type', confirmed_at_col='confirmed_at')

Calculate user-level confirmation rates for stays and tracks.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
user_id_col str

User identifier column.

'user_id'
type_col str

Column distinguishing Stay and Track.

'type'
confirmed_at_col str

Confirmation timestamp column.

'confirmed_at'

Returns:

Type Description
DataFrame

One row per user with stay and track confirmation counts and rates.

build_mode_detection_precision(storyline, *, mode_col='mode', detected_mode_col='detected_mode', type_col='type', confirmed_at_col='confirmed_at', confirmed_only=True, group_cols=('mode', 'mode_niv1', 'mode_mrmt'))

Compare detected and confirmed transport modes.

Parameters:

Name Type Description Default
storyline DataFrame

Parsed storyline table.

required
mode_col str

Confirmed or corrected mode column.

'mode'
detected_mode_col str

Detected mode column.

'detected_mode'
type_col str

Column used to keep only track rows.

'type'
confirmed_at_col str

Confirmation timestamp column.

'confirmed_at'
confirmed_only bool

When true, keep only rows with a confirmation timestamp when the column exists.

True
group_cols Iterable[str]

Mode columns to summarize when present.

('mode', 'mode_niv1', 'mode_mrmt')

Returns:

Type Description
DataFrame

A long table with one precision row per available grouping column and

DataFrame

label.

Resampling temporel

xyt_gps.quality_resampling

Temporal resampling helpers for missing tracking days.

resample_missing_stays(storyline, config)

Create transparent placeholder stays for missing tracking dates.

The generated rows are labelled Resampled_stay. They are useful when the analysis requires a continuous user calendar, but they remain identifiable through their type and comment_feedback values.

Parameters:

Name Type Description Default
storyline GeoDataFrame

Parsed storyline table with user_id, started_at and geometry columns.

required
config ProjectConfig

Project configuration used to assign experimental phases to inserted days.

required

Returns:

Type Description
GeoDataFrame

A tuple (storyline, missing_days) where storyline includes the

DataFrame

optional placeholder rows and missing_days lists one row per

tuple[GeoDataFrame, DataFrame]

inserted user-day.

Sélection utilisateurs

xyt_gps.quality_selection

User selection and consistent filtering of mobility datasets.

select_valid_tracking_users(user_stats, *, quality_column='tracking_quality_ok')

Return user ids that pass one quality column.

This helper is intentionally narrow and remains useful for quick checks. For analysis filters that combine tracking quality and GPS signal flags, prefer build_user_selection_table followed by select_analysis_users.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table containing user_id and the requested quality column.

required
quality_column str

Boolean column used as the selection criterion.

'tracking_quality_ok'

Returns:

Type Description
Index

User ids for which quality_column is true.

Raises:

Type Description
KeyError

If the requested quality column is missing.

filter_table_by_users(df, user_ids, *, user_id_column='user_id')

Filter any mobility table by selected users while keeping its schema.

Parameters:

Name Type Description Default
df DataFrame

Table to filter. If user_id_column is absent, the table is returned unchanged.

required
user_ids Iterable

Iterable of user identifiers to keep.

required
user_id_column str

Name of the user identifier column in df.

'user_id'

Returns:

Type Description
DataFrame

A copy of df restricted to selected users, with the index reset.

build_user_selection_table(user_stats, *, require_tracking_quality=False, exclude_bad_signal_users=True, max_low_quality_legs_share=None)

Build an explicit user-level table for selecting analysis users.

The function does not remove rows. It adds one boolean column per rule, a final analysis_user_ok flag and an analysis_user_reason column. This keeps exclusions inspectable before any table is filtered. The name deliberately uses selection_table rather than filter_matrix because the object is meant to be read by humans before being applied.

Parameters:

Name Type Description Default
user_stats DataFrame

User-level table produced by prepare_mobility_dataset. Expected columns depend on enabled rules: tracking_quality_ok, bad_signal_user and low_quality_legs_share.

required
require_tracking_quality bool

When true, keep only users with tracking_quality_ok == True. Leave false when tracking quality is used as a diagnostic rather than an exclusion rule, for example when the current dump does not cover all configured phases.

False
exclude_bad_signal_users bool

When true, exclude users flagged by the signal-quality step.

True
max_low_quality_legs_share float | None

Optional maximum share of level-1 low-quality legs tolerated per user.

None

Returns:

Type Description
DataFrame

A copy of user_stats with *_filter_ok, analysis_user_ok and

DataFrame

analysis_user_reason columns.

Raises:

Type Description
KeyError

If a column required by an enabled rule is missing.

ValueError

If max_low_quality_legs_share is outside [0, 1].

select_analysis_users(selection_table, *, quality_column='analysis_user_ok')

Return user ids selected by a user selection table.

Parameters:

Name Type Description Default
selection_table DataFrame

Table produced by build_user_selection_table.

required
quality_column str

Boolean column used as the final selection flag.

'analysis_user_ok'

Returns:

Type Description
Index

User ids for which quality_column is true.

Raises:

Type Description
KeyError

If user_id or quality_column is missing.

filter_mobility_dataset_by_users(dataset, user_ids)

Filter every user-indexed table of a MobilityDataset.

Mapping tables are filtered after the core tables so they only reference retained legs, trips, journeys and staypoints. The function does not recompute indicators; it only preserves relational consistency after a user selection.

Parameters:

Name Type Description Default
dataset MobilityDataset

Transformed mobility dataset.

required
user_ids Iterable

Iterable of user identifiers to keep.

required

Returns:

Type Description
MobilityDataset

A new MobilityDataset with filtered tables and unchanged validation

MobilityDataset

reports.

Spatial

xyt_gps.spatial

Spatial façade for geometry, GPS signal quality and zone helpers.

clean_leg_geometries(legs, *, drop_discontinuous=True)

Convert continuous MultiLineString legs to LineString.

max_consecutive_point_distance(geometry)

Return the maximum distance between consecutive points.

add_signal_loss_metrics(legs, config, *, geometry_col='geometry')

Add absolute and relative GPS signal-loss metrics to legs.

add_signal_quality_flags(legs, config, *, mode_col=None)

Add signal-loss metrics, leg flags and user signal-quality flags.

add_signal_quality_to_user_stats(user_stats, legs, *, signal_quality_computed=None)

Merge user-level signal-quality metrics into user_stats.

build_user_signal_quality_stats(legs)

Aggregate signal-quality flags at user level.

flag_low_quality_legs_by_mode(legs, config, *, mode_col=None)

Flag legs with mode-specific signal-loss thresholds.

identify_bad_signal_users(legs, *, quantile_threshold=0.995, user_id_col='user_id')

Identify users with unusually high average signal loss.

add_excursion_flags(table, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', flag_col='excursion', mode='outside', target_crs='EPSG:4326', operations_crs='EPSG:2056')

Flag geographic excursions on a geometry table.

add_journey_origin_destination_from_trips(journeys, trips, map_track_trip_journey, *, trip_origin_col='trip_origin_zone', trip_destination_col='trip_destination_zone', output_origin_col='journey_origin_zone', output_destination_col='journey_destination_zone', fill_value='Unknown')

Propagate first-origin and last-destination labels from trips to journeys.

add_leg_origin_destination_zones(legs, zones, *, zone_label_col, origin_col='origin_zone', destination_col='destination_zone', geometry_col='geometry', fill_value='Outside', target_crs='EPSG:4326')

Label leg origins and destinations from a zone layer.

add_spatial_zone_labels(table, zones, *, zone_label_col, output_col, geometry_col='geometry', predicate='within', fill_value='Unknown', target_crs='EPSG:4326')

Add a zone label to a geometry table by spatial join.

add_trip_origin_destination_from_legs(trips, legs, map_track_trip_journey, *, leg_origin_col='origin_zone', leg_destination_col='destination_zone', output_origin_col='trip_origin_zone', output_destination_col='trip_destination_zone', fill_value='Unknown')

Propagate first-origin and last-destination labels from legs to trips.

classify_leg_relation_to_area(legs, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', relation_col='area_relation', code_col=None, target_crs='EPSG:4326', operations_crs='EPSG:2056')

Classify each leg as intra, extra or exchange relative to an area.

Géométries spatiales

xyt_gps.spatial_geometry

Geometry helpers for mobility traces.

clean_leg_geometries(legs, *, drop_discontinuous=True)

Convert continuous MultiLineString legs to LineString.

max_consecutive_point_distance(geometry)

Return the maximum distance between consecutive points.

Qualité GPS spatiale

xyt_gps.spatial_quality

GPS signal-quality metrics and filters.

add_signal_loss_metrics(legs, config, *, geometry_col='geometry')

Add absolute and relative GPS signal-loss metrics to legs.

flag_low_quality_legs_by_mode(legs, config, *, mode_col=None)

Flag legs with mode-specific signal-loss thresholds.

identify_bad_signal_users(legs, *, quantile_threshold=0.995, user_id_col='user_id')

Identify users with unusually high average signal loss.

add_signal_quality_flags(legs, config, *, mode_col=None)

Add signal-loss metrics, leg flags and user signal-quality flags.

build_user_signal_quality_stats(legs)

Aggregate signal-quality flags at user level.

add_signal_quality_to_user_stats(user_stats, legs, *, signal_quality_computed=None)

Merge user-level signal-quality metrics into user_stats.

Zones et relations spatiales

xyt_gps.spatial_zones

Spatial labels, area relations and excursion flags.

add_excursion_flags(table, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', flag_col='excursion', mode='outside', target_crs='EPSG:4326', operations_crs='EPSG:2056')

Flag geographic excursions on a geometry table.

add_spatial_zone_labels(table, zones, *, zone_label_col, output_col, geometry_col='geometry', predicate='within', fill_value='Unknown', target_crs='EPSG:4326')

Add a zone label to a geometry table by spatial join.

add_leg_origin_destination_zones(legs, zones, *, zone_label_col, origin_col='origin_zone', destination_col='destination_zone', geometry_col='geometry', fill_value='Outside', target_crs='EPSG:4326')

Label leg origins and destinations from a zone layer.

classify_leg_relation_to_area(legs, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', relation_col='area_relation', code_col=None, target_crs='EPSG:4326', operations_crs='EPSG:2056')

Classify each leg as intra, extra or exchange relative to an area.

add_trip_origin_destination_from_legs(trips, legs, map_track_trip_journey, *, leg_origin_col='origin_zone', leg_destination_col='destination_zone', output_origin_col='trip_origin_zone', output_destination_col='trip_destination_zone', fill_value='Unknown')

Propagate first-origin and last-destination labels from legs to trips.

add_journey_origin_destination_from_trips(journeys, trips, map_track_trip_journey, *, trip_origin_col='trip_origin_zone', trip_destination_col='trip_destination_zone', output_origin_col='journey_origin_zone', output_destination_col='journey_destination_zone', fill_value='Unknown')

Propagate first-origin and last-destination labels from trips to journeys.

Tranches horaires

xyt_gps.spatial_time

Reusable time-slice helpers for mobility tables.

add_time_slices(table, *, time_slices=None, datetime_col='started_at', output_col='time_slice', timezone='Europe/Zurich', fallback_label='HC')

Add a reusable daily time-slice label to a mobility table.

Parameters:

Name Type Description Default
table DataFrame

Input table with a datetime column.

required
time_slices Iterable[TimeSlice] | None

Named intervals. Defaults to HPM 07:10-09:00 and HPS 17:30-20:00. Observations outside these intervals receive fallback_label, usually HC.

None
datetime_col str

Datetime column used to classify observations.

'started_at'
output_col str

Name of the created column.

'time_slice'
timezone str | None

Local timezone used before extracting the hour. Set to None to keep UTC timestamps.

'Europe/Zurich'
fallback_label str

Label assigned outside configured intervals.

'HC'

Returns:

Type Description
DataFrame

Copy of table with output_col.

Raises:

Type Description
KeyError

If datetime_col is missing.

H3

xyt_gps.spatial_h3

H3 point extraction and aggregation helpers.

legs_to_h3_points(legs, *, h3_resolution=9, geometry_col='geometry', metadata_cols=None, sample_distance_m=None, max_points_per_leg=None)

Convert leg geometries into H3-indexed point observations.

The function creates one row per point extracted from each leg geometry. By default, points are the vertices already present in the leg LineString. When sample_distance_m is provided, the line is sampled at a regular interval before H3 indexing. This is useful for fréquentation maps, but it should be documented as a sampled representation rather than raw GPS observations.

Parameters:

Name Type Description Default
legs DataFrame

Leg table, usually dataset.legs, with line geometries.

required
h3_resolution int | Iterable[int]

H3 resolution or list of resolutions, from 0 to 15. Higher values produce smaller hexagons and larger output tables.

9
geometry_col str

Geometry column containing LineString or MultiLineString objects.

'geometry'
metadata_cols Iterable[str] | None

Columns to copy from legs to each point row. When omitted, common mobility identifiers, dates, modes and phases are copied when present.

None
sample_distance_m float | None

Optional regular sampling distance in metres. If omitted, existing line vertices are used.

None
max_points_per_leg int | None

Optional cap after vertex extraction or regular sampling. This is a safeguard for very dense geometries.

None

Returns:

Type Description
GeoDataFrame

GeoDataFrame in EPSG:4326 with lon, lat, h3_cell,

GeoDataFrame

h3_resolution, point_sequence and copied metadata columns.

Raises:

Type Description
ImportError

If the optional h3 dependency is missing.

KeyError

If the geometry column is missing.

ValueError

If the H3 resolution or sampling distance is invalid.

aggregate_h3_frequencies(h3_points, *, h3_col='h3_cell', group_cols=None, user_col='user_id', leg_col='leg_id', trip_col='trip_id')

Aggregate H3-indexed points into fréquentation counts.

Parameters:

Name Type Description Default
h3_points DataFrame

Output of legs_to_h3_points() or another table containing an H3 cell column.

required
h3_col str

Column containing H3 cell identifiers.

'h3_cell'
group_cols Iterable[str] | None

Additional grouping columns. The H3 cell is always kept. Add columns such as mode_niv1, Phase or experiment_name to build dashboard-ready slices.

None
user_col str

User identifier column used for user_count when present.

'user_id'
leg_col str

Leg identifier column used for leg_count when present.

'leg_id'
trip_col str

Trip identifier column used for trip_count when present.

'trip_id'

Returns:

Type Description
DataFrame

DataFrame with counts by H3 cell and optional grouping columns.

build_h3_count_matrix(h3_points, *, h3_col='h3_cell', dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, metrics=DEFAULT_H3_COUNT_METRICS, user_col='user_id', leg_col='leg_id', trip_col='trip_id', fill_value=0)

Build a wide H3 count table for dashboards.

The output keeps one row per H3 cell and creates explicit metric columns for requested dimensions, for example point_count__mode_niv1__marche or trip_count__time_slice__hpm.

Parameters:

Name Type Description Default
h3_points DataFrame

H3-indexed points produced by legs_to_h3_points().

required
h3_col str

H3 cell column.

'h3_cell'
dimension_sets Iterable[Iterable[str]]

Dimension sets used to create count columns. Each set is aggregated independently. Missing dimension columns are skipped.

DEFAULT_H3_COUNT_DIMENSION_SETS
metrics Iterable[str]

Count metrics to compute: point_count, user_count, leg_count, trip_count.

DEFAULT_H3_COUNT_METRICS
user_col str

User identifier column.

'user_id'
leg_col str

Leg identifier column.

'leg_id'
trip_col str

Trip identifier column.

'trip_id'
fill_value int

Value used for absent combinations.

0

Returns:

Type Description
DataFrame

Wide table keyed by h3_resolution when present and h3_cell.

Exports spatiaux dashboard

xyt_gps.spatial_exports

Dashboard-oriented spatial table and DuckDB exports.

build_spatial_analytics_tables(dataset_or_legs, *, h3_resolution=9, config=None, frequency_group_cols=None, count_dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, count_metrics=DEFAULT_H3_COUNT_METRICS, include_count_matrix=True, time_slices=None, datetime_col='started_at', time_slice_col='time_slice', timezone=None, sample_distance_m=None)

Build H3 spatial analytics tables without writing files.

Parameters:

Name Type Description Default
dataset_or_legs MobilityDataset | DataFrame

A MobilityDataset or a leg table.

required
h3_resolution int | Iterable[int]

H3 resolution or list of resolutions used for point indexing.

9
config ProjectConfig | None

Optional project configuration. When provided, its timezone and time slices are used unless explicit values are passed.

None
frequency_group_cols Iterable[str] | None

Grouping columns for aggregate_h3_frequencies. Defaults to H3 cell only.

None
count_dimension_sets Iterable[Iterable[str]]

Dimension sets used for the wide count matrix.

DEFAULT_H3_COUNT_DIMENSION_SETS
count_metrics Iterable[str]

Count metrics included in the wide count matrix.

DEFAULT_H3_COUNT_METRICS
include_count_matrix bool

Whether to export h3_count_matrix.

True
time_slices Iterable[TimeSlice] | None

Optional reusable daily intervals. Defaults to config.time_slices or package defaults.

None
datetime_col str

Datetime column used to assign time slices.

'started_at'
time_slice_col str

Name of the time-slice column.

'time_slice'
timezone str | None

Local timezone used for time slices. Defaults to config.timezone or Europe/Zurich.

None
sample_distance_m float | None

Optional regular sampling distance in metres before H3 indexing.

None

Returns:

Type Description
dict[str, DataFrame]

Dictionary with leg_points_h3, h3_frequency and, when requested,

dict[str, DataFrame]

h3_count_matrix.

write_spatial_analytics_tables(tables, output_dir, *, formats=('parquet', 'csv'), overwrite=True, manifest_name='spatial_analytics_manifest.json')

Write precomputed spatial analytics tables to disk.

Use this when tables have already been built with build_spatial_analytics_tables() and need to be inspected before export.

Parameters:

Name Type Description Default
tables Mapping[str, DataFrame]

Mapping of table names to DataFrames.

required
output_dir str | Path

Destination directory.

required
formats Iterable[str]

Output formats: parquet, csv, pickle/pkl. Defaults to parquet and csv.

('parquet', 'csv')
overwrite bool

If false, fail when a target file already exists.

True
manifest_name str

JSON manifest file name.

'spatial_analytics_manifest.json'

Returns:

Type Description
DataFrame

Manifest with one row per written file.

write_spatial_analytics_exports(dataset_or_legs, output_dir, *, h3_resolution=9, config=None, frequency_group_cols=None, count_dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, count_metrics=DEFAULT_H3_COUNT_METRICS, include_count_matrix=True, time_slices=None, datetime_col='started_at', time_slice_col='time_slice', timezone=None, sample_distance_m=None, formats=('parquet', 'csv'), overwrite=True)

Write H3 point, frequency and count-matrix tables for dashboards.

Parameters:

Name Type Description Default
dataset_or_legs MobilityDataset | DataFrame

A MobilityDataset or a leg table.

required
output_dir str | Path

Destination directory, for example Data/Output/2-transformed-data/spatial-analytics.

required
h3_resolution int | Iterable[int]

H3 resolution or list of resolutions used for point indexing.

9
config ProjectConfig | None

Optional project configuration. When provided, its timezone and time slices are used unless explicit values are passed.

None
frequency_group_cols Iterable[str] | None

Grouping columns for aggregate_h3_frequencies. Defaults to H3 cell only.

None
count_dimension_sets Iterable[Iterable[str]]

Dimension sets used for the wide count matrix.

DEFAULT_H3_COUNT_DIMENSION_SETS
count_metrics Iterable[str]

Count metrics included in the wide count matrix.

DEFAULT_H3_COUNT_METRICS
include_count_matrix bool

Whether to export h3_count_matrix.

True
time_slices Iterable[TimeSlice] | None

Optional reusable daily intervals. Defaults to config.time_slices or package defaults.

None
datetime_col str

Datetime column used to assign time slices.

'started_at'
time_slice_col str

Name of the time-slice column.

'time_slice'
timezone str | None

Local timezone used for time slices. Defaults to config.timezone or Europe/Zurich.

None
sample_distance_m float | None

Optional regular sampling distance in metres before H3 indexing.

None
formats Iterable[str]

Output formats: parquet, csv, pickle/pkl. Defaults to parquet and csv.

('parquet', 'csv')
overwrite bool

If false, fail when a target file already exists.

True

Returns:

Type Description
DataFrame

Manifest with one row per written file.

write_duckdb_spatial_database(tables, database_path, *, overwrite=True, load_spatial=True, require_spatial_extension=False)

Write mobility tables to a local DuckDB database.

GeoDataFrame geometries are stored as WKB columns so that the database can be queried even without the spatial extension. When the extension is available, companion views ending with _spatial are created with a DuckDB geometry column.

Parameters:

Name Type Description Default
tables Mapping[str, DataFrame] | MobilityDataset

Mapping of table names to DataFrames, or a MobilityDataset.

required
database_path str | Path

Destination .duckdb file.

required
overwrite bool

If true, replace an existing database file.

True
load_spatial bool

Try to install and load DuckDB's spatial extension.

True
require_spatial_extension bool

If true, fail when the spatial extension cannot be loaded. Keep false for offline or lightweight use.

False

Returns:

Type Description
DataFrame

Manifest with table names, row counts and spatial-extension status.

Raises:

Type Description
ImportError

If the optional duckdb dependency is missing.

FileExistsError

If the database exists and overwrite is false.

Indicateurs

xyt_gps.indicators

Mobility indicator helpers computed from structured mobility tables.

build_person_day_indicators(dataset, *, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, config=None, default_phase_name='All')

Build person-day mobility indicators by mode.

The MVP indicators are distance, travel time and number of trips. Values are aggregated per user, date, analytical phase/period and mode. Distances are expressed in kilometers, durations in minutes. If the dataset has no Phase column, all rows are assigned to default_phase_name.

Parameters:

Name Type Description Default
dataset MobilityDataset

Transformed mobility dataset.

required
mode_col str

Leg mode column, for example mode_niv1, mode_niv2 or mode_mrmt.

'mode_niv1'
trips_mode_col str | None

Trip mode column. If absent, it is inferred as main_{mode_col}.

None
distance_col str | None

Optional leg distance column override. By default the function tries length, distance, then length_leg.

None
include_zero_days bool

When true, build a continuous user-day calendar from user_stats phase ranges and fill days without movement with zero.

True
include_excursions bool

When false, exclude rows flagged with excursion == 1 in legs and trips before computing indicators.

True
include_airplane bool

When false, exclude legs and trips whose source or mapped mode columns identify airplane travel. Airplane is excluded by default because it can dominate distances and CO2 indicators.

False
config ProjectConfig | None

Optional project configuration. If omitted, phase metadata stored in MobilityDataset is used when available.

None
default_phase_name str

Analytical period name used when no phase split is present in the dataset.

'All'

Returns:

Type Description
DataFrame

Long table with columns user_id, date, Phase, mode,

DataFrame

distance_km, travel_time_min, trip_count and leg_count.

Raises:

Type Description
KeyError

If required columns are missing.

build_person_phase_indicators(person_day, *, user_stats=None, weight_col='weight')

Average person-day indicators by user, phase and mode.

Parameters:

Name Type Description Default
person_day DataFrame

Table produced by build_person_day_indicators.

required
user_stats DataFrame | None

Optional user-level table used to attach weights.

None
weight_col str

User weight column. Missing weights are set to 1.

'weight'

Returns:

Type Description
DataFrame

Long table with mean daily distance, travel time and trip count by

DataFrame

user, phase and mode. If user_stats is provided, the table also

DataFrame

contains weight_col.

build_population_indicators(person_phase, *, use_weights=True, weight_col='weight')

Average person-phase indicators at population level.

Parameters:

Name Type Description Default
person_phase DataFrame

Table produced by build_person_phase_indicators.

required
use_weights bool

When true and weight_col is available, compute weighted means.

True
weight_col str

Weight column attached to person_phase.

'weight'

Returns:

Type Description
DataFrame

Long table with one row per phase and mode. n_users counts the

DataFrame

number of users contributing to each phase-mode mean.

compute_mobility_indicators(dataset, *, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, use_weights=True, weight_col='weight', config=None, default_phase_name='All')

Compute the first mobility indicator tables from a mobility dataset.

Parameters:

Name Type Description Default
dataset MobilityDataset

Transformed mobility dataset.

required
mode_col str

Leg mode column used as indicator granularity.

'mode_niv1'
trips_mode_col str | None

Optional trip mode column override.

None
distance_col str | None

Optional leg distance column override.

None
include_zero_days bool

Whether to include tracked days without movement as zero rows.

True
include_excursions bool

Whether to include rows flagged as excursions in the indicator base.

True
include_airplane bool

Whether to include airplane legs and trips in the indicator base. Defaults to false because airplane rows can dominate distance, CO2 and demand profiles.

False
use_weights bool

Whether to use weight_col from dataset.user_stats for population-level means.

True
weight_col str

User-level weight column. Missing weights are set to 1.

'weight'
config ProjectConfig | None

Optional project configuration. If omitted, phase metadata stored in MobilityDataset is used when available.

None
default_phase_name str

Analytical period name used when no phase split is present in the dataset.

'All'

Returns:

Type Description
IndicatorResult

IndicatorResult with person_day, person_phase and

IndicatorResult

population tables.

population_indicator_summary(indicators)

Return a compact population-level indicator summary.

Parameters:

Name Type Description Default
indicators IndicatorResult | DataFrame

IndicatorResult or a population table.

required

Returns:

Type Description
DataFrame

Population indicator table sorted by phase and mode.

Enrichissements

xyt_gps.enrichment

Optional enrichment helpers for mobility analysis tables.

CO2OccupancyConfig dataclass

Parameters used to derive CO2 and occupancy metrics from legs.

Factors are expressed in grams per kilometer. co2_g and co2_direct_g are then computed as leg-level totals. Occupancy is recomputed by default; set prefer_observed_occupancy=True to use a positive provider value from occupancy_col when available.

HealthConfig dataclass

Parameters used to derive simple physical-activity metrics.

add_co2_occupancy_metrics(legs, *, journeys=None, map_track_trip_journey=None, config=None, mode_col='mode', distance_col=None, journey_purpose_col='main_purpose_mrmt', prefer_observed_occupancy=None, occupancy_col=None)

Add occupancy and CO2 metrics to a leg table.

The function is intentionally row-level: it does not aggregate. This keeps assumptions visible before mobility indicators are computed.

By default, occupancy is recomputed from distance and purpose because provider columns are often sparse. Pass prefer_observed_occupancy=True to use a positive value from occupancy_col when available and fall back to the computed value otherwise.

add_health_metrics(legs, *, config=None, mode_col='mode_niv1', distance_col=None, duration_col='duration')

Add simple activity, intensity, MET and calorie metrics to legs.

build_leg_enrichment_tables(legs)

Return compact CO2 and health side tables keyed by leg_id.

Mobility motifs

xyt_gps.motifs

Daily mobility motif helpers.

The functions in this module migrate the useful core of the historical GPStoGraph workflow without exposing notebook-era graph objects as the main API. A motif is represented as a daily directed transition structure between visited places. Places can come from an existing location_id column, or be derived from staypoint purpose and rounded coordinates.

assign_mobility_motif_ids(motifs, *, signature_col='motif_signature', id_col='motif_id', top_n=9, other_motif_id=99)

Assign stable numeric ids to the most frequent motif signatures.

Parameters:

Name Type Description Default
motifs DataFrame

Motif table returned by build_mobility_motifs().

required
signature_col str

Column containing the canonical motif signature.

'motif_signature'
id_col str

Name of the generated id column.

'motif_id'
top_n int

Number of frequent motifs receiving ids from 1 to top_n.

9
other_motif_id int

Id used for less frequent motifs.

99

Returns:

Type Description
DataFrame

Copy of motifs with a numeric id_col.

Raises:

Type Description
KeyError

If signature_col is absent.

build_mobility_motifs(staypoints_or_dataset, *, user_col='user_id', started_at_col='started_at', finished_at_col='finished_at', date_col=None, location_col=None, purpose_col='purpose_niv1', lon_col='lon', lat_col='lat', coordinate_precision=4, top_n_motifs=9, other_motif_id=99)

Build daily mobility motifs from staypoints.

The function works on dataset.staypoints or on a staypoint DataFrame. A motif is the daily sequence of places visited by one user after removing consecutive duplicate places. The sequence is relabelled in order of first appearance, then encoded as a flattened directed adjacency matrix. This keeps the old motif_flat idea while making the result easy to export and compare.

Parameters:

Name Type Description Default
staypoints_or_dataset MobilityDataset | DataFrame

MobilityDataset or staypoint table.

required
user_col str

User identifier column.

'user_id'
started_at_col str

Start timestamp column used for sorting.

'started_at'
finished_at_col str

Optional end timestamp column kept in motif nodes.

'finished_at'
date_col str | None

Optional explicit date column. If omitted, the date is derived from started_at_col.

None
location_col str | None

Optional stable place identifier. If omitted, a place key is derived from purpose and rounded coordinates.

None
purpose_col str | None

Optional activity/purpose label used in derived place keys.

'purpose_niv1'
lon_col str

Longitude column used when deriving place keys.

'lon'
lat_col str

Latitude column used when deriving place keys.

'lat'
coordinate_precision int

Decimal precision for coordinate-derived keys.

4
top_n_motifs int

Number of frequent motif signatures assigned ids 1..N.

9
other_motif_id int

Id assigned to less frequent motifs.

99

Returns:

Type Description
DataFrame

One row per user-day with the canonical sequence, adjacency signature,

DataFrame

motif id and simple counts.

Raises:

Type Description
KeyError

If required columns are missing.

ValueError

If no usable staypoint row remains.

summarize_mobility_motifs(motifs, *, motif_id_col='motif_id', signature_col='motif_signature')

Summarize motif frequencies and simple structural properties.

build_mobility_motif_sequences(motifs, *, user_col='user_id', date_col='date', motif_id_col='motif_id', n_days=60, align_to_week=True, fill_value=0)

Build a fixed-width daily motif sequence for each user.

This is the package equivalent of the historical motif_sequence() helper. Each row is a user and each column is a relative day. Missing days are filled with fill_value. When align_to_week=True, the first observed motif is shifted so the first column corresponds to Monday.

Export

xyt_gps.export

Export helpers for structured mobility tables.

The export layer is kept separate from transformation code so formats such as CSV, Parquet or Excel remain optional package concerns.

mobility_dataset_tables(dataset)

Return the named tables contained in a MobilityDataset.

Parameters:

Name Type Description Default
dataset MobilityDataset

Transformed mobility dataset.

required

Returns:

Type Description
dict[str, DataFrame]

Dictionary keyed by stable table names. The order follows the

dict[str, DataFrame]

transformation workflow and is reused by export helpers.

write_mobility_dataset(dataset, output_dir, *, formats=('csv', 'geojson'), extra_tables=None, include_validation=True, include_quality_reports=True, selection_table=None, overwrite=True)

Write intermediate MobilityDataset tables to disk.

The function writes the inspectable states of the preparation workflow: storyline, legs, staypoints, trips, journeys, user stats, mapping tables and optional validation and quality reports. It returns a manifest so the caller can see exactly what was exported.

Parameters:

Name Type Description Default
dataset MobilityDataset

Transformed mobility dataset.

required
output_dir str | Path

Destination directory.

required
formats Iterable[str]

Export formats. Supported values are csv, geojson, parquet, pickle/pkl and xlsx. geojson is written only for geospatial tables.

('csv', 'geojson')
extra_tables Mapping[str, DataFrame] | None

Optional additional tables to export with the same formats, for example user_presence or participation_grid. Table names must be stable file-safe identifiers.

None
include_validation bool

Whether to export raw-schema validation issues.

True
include_quality_reports bool

Whether to export tracking-quality summary and optional user selection table.

True
selection_table DataFrame | None

Optional table produced by build_user_selection_table.

None
overwrite bool

If false, fail when a target file already exists.

True

Returns:

Type Description
DataFrame

A manifest with one row per written file: table, format, path and

DataFrame

number of rows.

Raises:

Type Description
FileExistsError

If overwrite is false and a file already exists.

ValueError

If an unsupported format is requested.

ImportError

If optional dependencies for Parquet or Excel are missing.

export_mobility_tables(*args, **kwargs)

Alias for write_mobility_dataset.

write_mobility_dataset is the preferred name because it says that the full structured dataset is written, including reports and mapping tables.

write_indicator_result(indicators, output_dir, *, formats=('parquet', 'csv'), overwrite=True)

Write mobility indicator tables to disk.

Parameters:

Name Type Description Default
indicators IndicatorResult

Result returned by compute_mobility_indicators.

required
output_dir str | Path

Destination directory.

required
formats Iterable[str]

Export formats. Supported values are csv, parquet and pickle/pkl.

('parquet', 'csv')
overwrite bool

If false, fail when a target file already exists.

True

Returns:

Type Description
DataFrame

Manifest with one row per written file.

Visualisation cartes

xyt_gps.viz_maps

Interactive map visualizations for GPS traces and H3 cells.

plot_h3_frequency_map(h3_frequency, *, h3_col='h3_cell', value_col='point_count', tooltip_cols=None, aggregate_cells=True, max_cells=2500, palette=('#f4f1f8', '#d9cce9', '#b79bd5', '#8f67bd', '#6b4c9a', '#3d275f'), fill_opacity=0.62, line_opacity=0.25, tiles='cartodbpositron', zoom_start=11, map_center=None, save_path=None)

Plot H3 fréquentation cells on an interactive Folium map.

Parameters:

Name Type Description Default
h3_frequency DataFrame

Table produced by aggregate_h3_frequencies() or any table containing an H3 cell column and a numeric count.

required
h3_col str

Column containing H3 cell identifiers.

'h3_cell'
value_col str

Numeric column used to color cells.

'point_count'
tooltip_cols Iterable[str] | None

Columns shown in the popup. When omitted, common count columns are used when present.

None
aggregate_cells bool

If true, aggregate rows by H3 cell before plotting. This is useful when the input is split by mode, phase or project.

True
max_cells int | None

Optional maximum number of cells to draw. The most frequent cells are kept. Set to None to draw all cells, which can be slow.

2500
palette tuple[str, ...]

Sequential color palette from low to high values.

('#f4f1f8', '#d9cce9', '#b79bd5', '#8f67bd', '#6b4c9a', '#3d275f')
fill_opacity float

Polygon fill opacity.

0.62
line_opacity float

Polygon border opacity.

0.25
tiles str

Folium base map.

'cartodbpositron'
zoom_start int

Initial zoom level.

11
map_center tuple[float, float] | None

Optional (lat, lon) map center. If omitted, it is estimated from the H3 cells.

None
save_path str | Path | None

Optional HTML output path.

None

Returns:

Type Description
object

A folium.Map object.

Raises:

Type Description
ImportError

If folium or h3 is not installed.

KeyError

If required columns are missing.

ValueError

If no valid H3 cell can be plotted.

plot_gps_traces(dataset_or_legs, *, staypoints=None, user_ids=None, sample_n=None, random_state=42, color_by='mode_niv1', geometry_col='geometry', show_staypoints=True, use_antpath=True, tiles='cartodbpositron', zoom_start=12, map_center=None, save_path=None)

Plot GPS legs and optional staypoints on an interactive Folium map.

The function is designed for notebook checks. It keeps the API small: pass a MobilityDataset after transformation, or pass a leg GeoDataFrame directly.

Parameters:

Name Type Description Default
dataset_or_legs MobilityDataset | GeoDataFrame

MobilityDataset or leg GeoDataFrame.

required
staypoints GeoDataFrame | None

Optional staypoint GeoDataFrame when passing legs directly.

None
user_ids Iterable | None

Optional users to display.

None
sample_n int | None

Optional number of legs to sample before plotting.

None
random_state int | None

Random seed used for leg sampling.

42
color_by str | None

Column used to color legs. Set to None for one color.

'mode_niv1'
geometry_col str

Geometry column name.

'geometry'
show_staypoints bool

Whether to draw staypoints as circles.

True
use_antpath bool

Whether to animate legs with Folium AntPath.

True
tiles str

Folium base map.

'cartodbpositron'
zoom_start int

Initial zoom level.

12
map_center tuple[float, float] | None

Optional (lat, lon) center. If absent, the center is computed from plotted geometries.

None
save_path str | Path | None

Optional HTML output path.

None

Returns:

Type Description
object

A folium.Map object.

Raises:

Type Description
ImportError

If folium is not installed.

KeyError

If the geometry column is missing.

ValueError

If no leg geometry is available for plotting.

Visualisation indicateurs

xyt_gps.viz_indicators

Notebook-friendly indicator charts.

plot_indicator_bars(indicators, *, table='population', metrics=None, group_col='mode', facet_col='Phase', title='Indicateurs de mobilité', max_bars=None, group_order=None, sort_bars_by_value=False, value_format='{:.1f}', bar_color='#6b4c9a', include_all_modes=True, all_modes_label='Tous modes', all_modes_color=ALL_MODES_COLOR, show_demand_profile=True, demand_profile_max_modes=6, metadata=None, show_identity_card=True, save_path=None)

Render mobility indicators as simple notebook bar charts.

The default input is an IndicatorResult returned by compute_mobility_indicators(). The function reads its population table and plots the main per-day metrics by mode. It can also receive a DataFrame directly, for example indicators.person_phase.

Parameters:

Name Type Description Default
indicators IndicatorResult | DataFrame

IndicatorResult or an indicator DataFrame.

required
table str

Indicator table to use when indicators is an IndicatorResult: person_day, person_phase or population.

'population'
metrics Iterable[str] | None

Numeric columns to plot. When omitted, common indicator columns are inferred.

None
group_col str

Categorical column used for bars, usually mode.

'mode'
facet_col str | None

Optional column used to create one panel per period/phase.

'Phase'
title str

Displayed title.

'Indicateurs de mobilité'
max_bars int | None

Optional maximum number of bars per panel.

None
group_order Iterable[str] | None

Optional explicit order for the categorical bars. When omitted, common mobility modes use a stable default order so that phases remain visually comparable.

None
sort_bars_by_value bool

When true, sort bars by decreasing value inside each panel. This reproduces the older compact ranking behavior, but can make modes change position between phases.

False
value_format str

Python format string used for numeric labels.

'{:.1f}'
bar_color str

CSS color for bars.

'#6b4c9a'
include_all_modes bool

Whether to add a total row across all displayed modes in each phase/panel.

True
all_modes_label str

Label used for the total row.

'Tous modes'
all_modes_color str

CSS color used for the total row.

ALL_MODES_COLOR
show_demand_profile bool

Whether to display 5-minute daily demand curves when IndicatorResult.metadata contains them.

True
demand_profile_max_modes int

Maximum number of demand curves per phase, including the all-modes curve.

6
metadata Mapping[str, object] | None

Optional metadata displayed in the identity card. Values override IndicatorResult.metadata when the input is an IndicatorResult.

None
show_identity_card bool

Whether to display calculation metadata before the bars.

True
save_path str | Path | None

Optional HTML file path.

None

Returns:

Type Description
object

An IPython.display.HTML object when IPython is available, otherwise

object

the raw HTML string.

Raises:

Type Description
KeyError

If required columns are missing.

ValueError

If no metric can be plotted.

Visualisation participation

xyt_gps.viz_participation

Notebook-friendly participation heatmaps.

plot_participation_heatmap(participation_grid, *, user_col='user_id', week_col='protocol_week_number', score_col='active_days_count', phase_col='Phase', title='Participation hebdomadaire', max_score=7, max_users=None, cell_size=13, cell_gap=3, show_phase_separators=True, phase_separator_color='#e83b46', phase_separator_width=3, save_path=None)

Render a GitHub-style participation heatmap in a notebook.

The input is the long table produced by build_weekly_participation_grid. Rows are users, columns are protocol weeks, and cell color intensity is based on score_col, usually the number of active tracked days from 0 to 7.

Parameters:

Name Type Description Default
participation_grid DataFrame

Weekly participation table.

required
user_col str

User identifier column.

'user_id'
week_col str

Week number column.

'protocol_week_number'
score_col str

Participation score column.

'active_days_count'
phase_col str

Optional phase/period column used in cell tooltips.

'Phase'
title str

Displayed title.

'Participation hebdomadaire'
max_score int

Maximum score used for the color scale.

7
max_users int | None

Optional maximum number of users displayed.

None
cell_size int

Square size in pixels.

13
cell_gap int

Gap between squares in pixels.

3
show_phase_separators bool

Whether to add a red separator when the phase changes between two consecutive weeks.

True
phase_separator_color str

CSS color used for phase separators.

'#e83b46'
phase_separator_width int

Separator column width in pixels.

3
save_path str | Path | None

Optional HTML file path.

None

Returns:

Type Description
object

An IPython.display.HTML object when IPython is available, otherwise

object

the raw HTML string.

Raises:

Type Description
KeyError

If required columns are missing.

ValueError

If the participation grid is empty.

Mappings

xyt_gps.mappings

Mode and purpose mapping configuration.

MobilityMappings dataclass

Project-level mode and purpose mappings used by transformations.

mode_purpose_mapping(**kwargs)

Build mode and purpose mappings for a project.

This is a readable factory around MobilityMappings. Pass the same keyword arguments as the dataclass when a project needs custom groupings.

Example

mode_purpose_mapping(storyline_mode_niv1={"Marche": ("Mode::Walk",)})

map_value(value, mapping, *, default='Autres')

Map one provider value to a project category.

map_sequence(value, mapping, *, default='Autres')

Map a provider sequence such as Mode::Walk + Mode::Bus category by category.