Référence API complète¶

Cette page liste les modules documentés par docstrings. Pour apprendre le package ou choisir les fonctions principales, commencer par API recommandée.

Les fonctions plus fines servent surtout aux notebooks de production, aux contrôles table par table et au débogage.

Pipeline recommandé¶

`xyt_gps.pipeline` ¶

Recommended end-to-end entry point for the mobility pipeline.

`MobilityPipelineResult` `dataclass` ¶

Named result returned by run_mobility_pipeline.

Attributes:

Name	Type	Description
`raw`	`RawGpsData`	Raw GPS tables after loading or validation.
`dataset`	`MobilityDataset`	Structured mobility tables.
`indicators`	`IndicatorResult \| None`	Optional mobility indicators. `None` when `compute_indicators=False`.

`iter()` ¶

Allow raw, dataset, indicators = run_mobility_pipeline(...).

`run_mobility_pipeline(config, *, raw=None, sample=None, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, validate=True, must_exist=True, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, compute_indicators=True, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, use_weights=True, default_phase_name='All')` ¶

Run the recommended single-source GPS-to-indicators workflow.

This is the simplest entry point for analysts discovering the package: load raw GPS tables, prepare mobility tables and optionally compute the generic indicators. Pass raw when tables have already been loaded in a notebook; otherwise the function calls load_gps_export(config).

Parameters:

Name	Type	Description	Default
`config`	`ProjectConfig`	Project configuration.	required
`raw`	`RawGpsData \| None`	Optional preloaded raw GPS tables. When omitted, files are loaded from `config` with `load_gps_export`.	`None`
`sample`	`RawSampleConfig \| None`	Optional sampling strategy used only when `raw` is omitted.	`None`
`sociodemo`	`DataFrame \| None`	Optional user-level sociodemographic table.	`None`
`weights`	`DataFrame \| None`	Optional user-level weighting table.	`None`
`weight_col`	`str`	Name of the user weight column.	`'weight'`
`default_weight`	`float`	Default user weight when no weight is available.	`1.0`
`validate`	`bool`	Whether to validate raw tables during loading.	`True`
`must_exist`	`bool`	Whether expected files must exist when loading from disk.	`True`
`resample_missing_days`	`bool`	Add explicit `Resampled_stay` rows for missing tracked days.	`False`
`clean_leg_geometries`	`bool`	Normalize continuous leg geometries.	`True`
`add_length_outlier_flags`	`bool`	Add per-mode length outlier flags.	`True`
`add_signal_quality_flags`	`bool`	Compute GPS signal-loss quality columns.	`True`
`compute_indicators`	`bool`	Whether to compute `IndicatorResult`.	`True`
`mode_col`	`str`	Leg mode column used for indicators.	`'mode_niv1'`
`trips_mode_col`	`str \| None`	Optional trip mode column override.	`None`
`distance_col`	`str \| None`	Optional leg distance column override.	`None`
`include_zero_days`	`bool`	Include tracked days without movement in daily means.	`True`
`include_excursions`	`bool`	Include legs/trips flagged as excursions.	`True`
`include_airplane`	`bool`	Include airplane legs/trips in indicators. Defaults to false because airplane rows can dominate distances and CO2.	`False`
`use_weights`	`bool`	Use user weights in population indicators.	`True`
`default_phase_name`	`str`	Period label when no phase split exists.	`'All'`

Returns:

Type	Description
`MobilityPipelineResult`	`MobilityPipelineResult` containing raw tables, mobility tables and
`MobilityPipelineResult`	optional indicators.

Configuration¶

`xyt_gps.config` ¶

Configuration objects for project-specific GPS transformations.

`Phase` `dataclass` ¶

Named experimental phase used to tag and filter mobility records.

Parameters:

Name	Type	Description	Default
`name`	`str`	Stable phase name used in output columns and reports.	required
`start`	`str \| Timestamp`	Inclusive phase start date.	required
`end`	`str \| Timestamp`	Inclusive phase end date.	required

`TimeSlice` `dataclass` ¶

Named daily time interval used for reusable temporal aggregation.

Parameters:

Name	Type	Description	Default
`name`	`str`	Stable output label, for example `HPM`, `HC` or `HPS`.	required
`start`	`str`	Inclusive start time formatted as `HH:MM`.	required
`end`	`str`	Exclusive end time formatted as `HH:MM`.	required

`TrackingThresholds` `dataclass` ¶

Minimum tracking duration expected for analysis.

Parameters:

Name	Type	Description	Default
`min_days_by_phase`	`Mapping[str, int]`	Minimum number of tracked days required by phase name. Missing phases default to one day in the filtering helpers.	`dict()`
`min_total_tracked_days`	`int`	Minimum active tracking days across the full observation period.	`7`
`round_to_full_weeks`	`bool`	If true, phase durations are rounded down to complete weeks after applying the minimum-day threshold.	`True`

`SignalLossThreshold` `dataclass` ¶

Mode-specific signal-loss thresholds.

max_gap_m is the absolute largest distance allowed between two consecutive points of a leg geometry, in meters. max_relative_gap is the same gap divided by total leg length.

`SpatialQualityThresholds` `dataclass` ¶

Spatial quality thresholds that may vary by project.

Parameters:

Name	Type	Description	Default
`max_consecutive_point_distance_m`	`float \| None`	Reserved threshold for future point-jump filters.	`None`
`max_relative_signal_loss`	`float \| None`	Reserved global threshold for future simple signal-loss filters.	`None`
`outlier_quantiles_by_mode`	`tuple[float, ...]`	Quantiles used to flag unusually long legs within each mode.	`(0.98, 0.99)`
`signal_loss_thresholds_by_level`	`Mapping[int, Mapping[str, SignalLossThreshold]]`	Mode-specific thresholds used to create `low_quality_legs_*` flags.	`default_signal_loss_thresholds()`
`bad_signal_user_quantile`	`float`	Quantile used to identify users with very high average signal loss.	`0.995`
`signal_loss_mode_column`	`str`	Column used to match mode-specific signal thresholds.	`'mode'`

`MatchingThresholds` `dataclass` ¶

Temporal matching and future map-matching parameters.

Parameters:

Name	Type	Description	Default
`leg_trip_journey_tolerance`	`str`	Maximum temporal tolerance used when matching legs, trips and journeys.	`'5s'`
`osrm_max_points_per_chunk`	`int`	Reserved chunk size for future OSRM requests.	`99`
`google_directions_fallback`	`bool`	Reserved flag for a future Google Directions fallback.	`False`

`ProjectConfig` `dataclass` ¶

Project parameters that should remain explicit in the workflow.

ProjectConfig centralizes import paths, project names, periods, coordinate systems, phases, thresholds and mappings. The goal is to avoid hiding methodological choices inside large transformation functions.

Parameters:

Name	Type	Description	Default
`experiment_name`	`str \| None`	Optional analytical experiment name, for example `declic-prefig` or `declic-ziplo`. Leave empty for generic GPS processing without experiment grouping.	`None`
`motiontag_project_name`	`str \| None`	Optional provider project name used in export file names. Required only when loading files by inferred provider names.	`None`
`period`	`str \| None`	Optional export period string used in provider file names. Required only when loading files by inferred provider names.	`None`
`raw_data_dir`	`str \| Path`	Directory containing raw GPS CSV exports.	`'.'`
`export_dir`	`str \| Path \| None`	Optional output directory.	`None`
`target_crs`	`str`	CRS attached to parsed GPS geometries.	`'EPSG:4326'`
`operations_crs`	`str`	Metric CRS used for distance calculations.	`'EPSG:2056'`
`csv_sep`	`str`	CSV separator used by GPS exports.	`';'`
`timezone`	`str`	Local timezone label.	`'Europe/Zurich'`
`phases`	`tuple[Phase, ...]`	Optional analytical phases used for phase assignment and tracking filters. Leave empty for analyses without phase split.	`()`
`tracking_thresholds`	`TrackingThresholds`	User-level tracking-quality thresholds.	`TrackingThresholds()`
`spatial_quality_thresholds`	`SpatialQualityThresholds`	Leg and user GPS-quality thresholds.	`SpatialQualityThresholds()`
`matching_thresholds`	`MatchingThresholds`	Temporal matching thresholds.	`MatchingThresholds()`
`mappings`	`MobilityMappings`	Mode and purpose mappings.	`MobilityMappings()`
`time_slices`	`tuple[TimeSlice, ...]`	Daily time slices used for reusable temporal aggregations. Values outside the configured intervals are labelled `HC` by default.	`default_time_slices()`
`reference_year`	`int \| None`	Optional year used to derive age from sociodemographic data.	`None`
`storyline_prefix_candidates`	`tuple[str, ...]`	Accepted storyline file prefixes.	`('StorylineWithTripId', 'StorylineWithUserAnnotations')`

`default_time_slices()` ¶

Return default mobility-dashboard time slices.

HC is intentionally not defined as an interval: it is the fallback label for observations outside the configured peak periods.

`default_signal_loss_thresholds()` ¶

Return the default signal-loss thresholds.

These values reproduce the current spatial-quality convention. Some absolute thresholds are intentionally very low in that notebook, making the relative signal-loss threshold the effective criterion. They should remain configurable by project.

Import¶

`xyt_gps.io` ¶

Input helpers for structured GPS exports and project side tables.

`GpsExportPaths` `dataclass` ¶

Resolved file paths for one structured GPS export period.

Attributes:

Name	Type	Description
`storyline`	`Path`	Path to the storyline CSV.
`trips`	`Path`	Path to the trips CSV.
`journeys`	`Path`	Path to the journeys CSV.
`user_statistics`	`Path \| None`	Optional path to the user statistics CSV.

`RawSampleConfig` `dataclass` ¶

Sampling options for loading large raw GPS exports.

Use RawSampleConfig.by_users(n) when checking a large export: it keeps all rows for a small number of selected users and preserves the relation between storyline, trips, journeys and user statistics. Use RawSampleConfig.random_rows(n) only for quick schema inspection.

Attributes:

Name	Type	Description
`mode`	`str`	Sampling strategy, either `users` or `rows`.
`n`	`int`	Number of users or rows to keep.
`random_state`	`int \| None`	Optional random seed.
`chunksize`	`int`	CSV chunk size for user-based sampling.
`user_id_column`	`str`	User identifier column in storyline, trips and journeys.
`user_statistics_id_column`	`str`	User identifier column in user statistics.
`storyline_type_column`	`str`	Column used to identify track rows.
`track_type_value`	`str`	Value used for track rows in storyline.

`by_users(n, *, random_state=42, chunksize=100000, user_id_column='user_id')` `classmethod` ¶

Keep all raw rows for n randomly selected users.

`random_rows(n, *, random_state=42, chunksize=100000, user_id_column='user_id')` `classmethod` ¶

Keep n random rows per table; storyline rows prefer Track rows.

`infer_gps_export_paths(config, *, must_exist=True)` ¶

Infer expected raw file paths from project parameters.

Parameters:

Name	Type	Description	Default
`config`	`ProjectConfig`	Project configuration containing `raw_data_dir`, `motiontag_project_name`, `period` and accepted storyline prefixes.	required
`must_exist`	`bool`	When true, raise an error if any expected file is absent.	`True`

Returns:

Type	Description
`GpsExportPaths`	A `GpsExportPaths` object with resolved CSV paths.

Raises:

Type	Description
`FileNotFoundError`	If `must_exist` is true and at least one expected CSV file is missing.

`source_id_for_config(config)` ¶

Build a stable source id for multi-project imports.

The source id is used to namespace identifiers when several projects or periods are loaded in one pass.

`sample_raw_gps_data(raw, sample)` ¶

Apply a raw-data sample before validation or transformation.

Parameters:

Name	Type	Description	Default
`raw`	`RawGpsData`	Raw GPS tables already loaded in memory.	required
`sample`	`RawSampleConfig \| None`	Sampling configuration. If `None`, the raw object is returned unchanged.	required

Returns:

Type	Description
`RawGpsData`	A new `RawGpsData` object containing sampled tables.

Raises:

Type	Description
`KeyError`	If user-based sampling is requested and the user id column is missing.
`ValueError`	If the sampling mode is unsupported.

`load_gps_export(config, *, sample=None, validate=True, must_exist=True)` ¶

Load raw GPS CSV exports without transforming their schema.

Parameters:

Name	Type	Description	Default
`config`	`ProjectConfig`	Project configuration used to infer file paths and CSV separator.	required
`sample`	`RawSampleConfig \| None`	Optional sampling strategy. `RawSampleConfig.by_users` reads CSV files by chunks and keeps all records for selected users.	`None`
`validate`	`bool`	When true, attach schema-validation reports to the returned object.	`True`
`must_exist`	`bool`	When true, fail if expected CSV files are missing.	`True`

Returns:

Type	Description
`RawGpsData`	Raw storyline, trips, journeys and optional user-statistics tables.

Raises:

Type	Description
`FileNotFoundError`	If required files are missing.
`KeyError`	If user-based sampling references a missing user column.

`load_gps_source(config, *, source_id=None, sample=None, validate=True, must_exist=True, namespace_ids=True)` ¶

Load one GPS source and add source metadata columns.

Parameters:

Name	Type	Description	Default
`config`	`ProjectConfig`	Project configuration for one project or period.	required
`source_id`	`str \| None`	Optional source identifier. If absent, it is built from the configuration.	`None`
`sample`	`RawSampleConfig \| None`	Optional raw sampling strategy.	`None`
`validate`	`bool`	Whether to attach validation reports.	`True`
`must_exist`	`bool`	Whether expected files must exist.	`True`
`namespace_ids`	`bool`	When true, prefix ids with `source_id` to avoid collisions in multi-source imports.	`True`

Returns:

Type	Description
`RawGpsData`	Raw GPS tables with `xyt_source_id`, project metadata and
`RawGpsData`	optional `raw_*` identifier columns.

`concat_raw_gps_data(raws, *, validate=True)` ¶

Concatenate several already-loaded raw GPS datasets.

Parameters:

Name	Type	Description	Default
`raws`	`Iterable[RawGpsData]`	Raw datasets to concatenate. They should already contain source metadata if they come from different projects or periods.	required
`validate`	`bool`	Whether to validate the concatenated raw tables.	`True`

Returns:

Type	Description
`RawGpsData`	One `RawGpsData` object containing all rows.

Raises:

Type	Description
`ValueError`	If `raws` is empty.

`load_gps_sources(configs, *, sample=None, validate=True, must_exist=True, namespace_ids=True)` ¶

Load and concatenate raw GPS exports from several sources.

Parameters:

Name	Type	Description	Default
`configs`	`Iterable[ProjectConfig]`	Project configurations, one per project or period.	required
`sample`	`RawSampleConfig \| None`	Optional sampling strategy applied to each source.	`None`
`validate`	`bool`	Whether to attach validation reports.	`True`
`must_exist`	`bool`	Whether expected files must exist.	`True`
`namespace_ids`	`bool`	Whether to prefix identifiers by source id.	`True`

Returns:

Type	Description
`RawGpsData`	A concatenated `RawGpsData` object ready for transformation.

`load_sociodemo(path, *, user_id_column='Id')` ¶

Load a sociodemographic side table and standardize the user id column.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	CSV or Excel file path.	required
`user_id_column`	`str`	Column to rename to `user_id`.	`'Id'`

Returns:

Type	Description
`DataFrame`	A pandas table that can be passed to `prepare_mobility_dataset`.

Données testset¶

`xyt_gps.sample_data` ¶

Demo sample-data helpers.

The default sample is a personal GPS storyline explicitly authorized for this package demo. It is loaded only when requested.

`find_sample_gps_path(start=None)` ¶

Find the authorized demo pickle from the current repo layout.

`load_sample_gps(path=None, *, user_id='sample_user', max_rows=None, validate=True)` ¶

Load the authorized personal demo storyline as a structured GPS raw dataset.

The source pickle contains a storyline table only. The function pseudonymizes user_id by default and derives minimal Trips, Journeys and UserStatistics tables so that tutorial transformations can run without a full raw GPS export.

Données synthétiques¶

`xyt_gps.synthetic` ¶

Synthetic GPS testset generation from the authorized sample.

The generator is intentionally explicit and parameter-driven. With one authorized source user, the package cannot learn a defensible population model; it can, however, bootstrap realistic structured GPS days and inject controlled tracking anomalies for testing the transformation pipeline.

`SyntheticExperiment` `dataclass` ¶

Experiment window used by the synthetic Declic generator.

Parameters:

Name	Type	Description	Default
`experiment_name`	`str`	Stable experiment identifier added to all generated tables, for example `declic-mobility-vague1`.	required
`phases`	`tuple[Phase, ...]`	Optional experiment phases. Empty phases are allowed for a generic GPS testset without analytical phase split.	`()`
`motiontag_project_name`	`str \| None`	Optional provider project name. Defaults to `experiment_name`.	`None`

`SyntheticAnomalyRates` `dataclass` ¶

Controlled anomaly rates injected into synthetic raw tables.

Parameters:

Name	Type	Description	Default
`missing_geometry_rate`	`float`	Share of storyline rows with a missing geometry. Keep this below the raw schema tolerance when the generated dataset must pass `prepare_mobility_dataset()` without manual cleaning.	`0.002`
`unconfirmed_rate`	`float`	Share of rows where confirmation timestamps are cleared.	`0.2`
`mode_mismatch_rate`	`float`	Share of track rows where `detected_mode` differs from the final `mode`.	`0.06`
`extreme_length_rate`	`float`	Share of track rows with inflated length values.	`0.01`

`SyntheticGpsDataset` `dataclass` ¶

Generated structured GPS testset and companion construction tables.

Attributes:

Name	Type	Description
`raw`	`RawGpsData`	Raw GPS-like tables ready for validation or transformation.
`user_presence`	`DataFrame`	User-level construction table with experiment and phase windows.
`generation_manifest`	`DataFrame`	Summary of generated rows, users and active days.

`tables()` ¶

Return all generated tables keyed by export name.

`default_declic_synthetic_experiments()` ¶

Return the five Declic experiment windows used for synthetic tests.

`generate_synthetic_declic_gps(*, sample_path=None, experiments=None, users_per_experiment=50, random_state=42, anomaly_rates=None, validate=True)` ¶

Generate a Declic-like synthetic GPS dataset from the sample.

The method uses bootstrap resampling of observed sample days, shifts them to the requested experiment phases, perturbs geometries and injects controlled anomalies. It is suited for tests, tutorials and pipeline validation. It is not a trained behavioral model.

Parameters:

Name	Type	Description	Default
`sample_path`	`str \| Path \| None`	Optional path to the authorized sample pickle. If omitted, the package searches the repository layout.	`None`
`experiments`	`Iterable[SyntheticExperiment] \| None`	Experiment definitions. Defaults to prefiguration, waves 1-3 and ZIPLO.	`None`
`users_per_experiment`	`int`	Number of synthetic users generated for each experiment.	`50`
`random_state`	`int \| None`	Seed used for reproducible generation.	`42`
`anomaly_rates`	`SyntheticAnomalyRates \| None`	Rates for missing geometry, unconfirmed rows, mode mismatch and extreme lengths.	`None`
`validate`	`bool`	If true, attach raw schema validation reports to the output.	`True`

Returns:

Type	Description
`SyntheticGpsDataset`	A `SyntheticGpsDataset` containing raw GPS-like tables,
`SyntheticGpsDataset`	`user_presence` and a generation manifest.

`write_synthetic_gps_dataset(dataset, output_dir, *, formats=('parquet',), overwrite=True)` ¶

Write a generated synthetic GPS dataset as landing-ready files.

Parameters:

Name	Type	Description	Default
`dataset`	`SyntheticGpsDataset`	Synthetic dataset produced by `generate_synthetic_declic_gps()`.	required
`output_dir`	`str \| Path`	Destination directory.	required
`formats`	`Iterable[str]`	Formats for large event tables. Supported values are `csv` and `parquet`. Defaults to Parquet to avoid very large CSV files. Construction tables are always written as CSV.	`('parquet',)`
`overwrite`	`bool`	If false, fail when a target file already exists.	`True`

Returns:

Type	Description
`DataFrame`	A manifest with one row per written file.

Schémas¶

`xyt_gps.schema` ¶

Schema validation for raw GPS mobility exports.

`SchemaSpec` `dataclass` ¶

Column contract for one raw GPS input table.

`expected_gps_schema()` ¶

Return the expected GPS import schema as an inspectable table.

The current schema describes the structured GPS multi-table contract used by the package: storyline, trips, journeys and user statistics. It is intentionally generic at the public API level because the landing step may adapt source-specific files and column names before data loading.

`check_raw_import_columns(storyline, user_statistics=None, *, trips=None, journeys=None, include_recommended=True, raise_on_error=False)` ¶

Check the raw column structure before building RawGpsData.

This is a lightweight, notebook-friendly check. It focuses on the column names expected at the raw import stage. It does not parse dates, geometries or modes.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Raw storyline table to check.	required
`user_statistics`	`DataFrame \| None`	Optional raw user statistics table to check.	`None`
`trips`	`DataFrame \| None`	Optional raw trips table to check.	`None`
`journeys`	`DataFrame \| None`	Optional raw journeys table to check.	`None`
`include_recommended`	`bool`	Include non-blocking recommended columns in the report.	`True`
`raise_on_error`	`bool`	Raise a `ValueError` when a required column is missing.	`False`

Returns:

Type	Description
`DataFrame`	A DataFrame with one row per expected column. Important columns are:
`DataFrame`	`table`, `expected_column`, `present`, `status` and `message`.

`validate_schema(df, spec, *, allow_extra_columns=True)` ¶

Validate column presence and basic null rates without changing data.

`validate_gps_raw(raw)` ¶

Validate all raw GPS tables that are present.

Transformations¶

`xyt_gps.transform` ¶

Orchestration from structured GPS exports to mobility tables.

`prepare_mobility_dataset(raw, config, *, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validation=None)` ¶

Run the transparent preparation workflow.

The function orchestrates validation, storyline parsing, trip and journey preparation, mappings, split into staypoints and legs, GPS quality flags, user tracking stats and relation tables. Each step is also exposed as a smaller public function so notebooks can inspect intermediate states.

Parameters:

Name	Type	Description	Default
`raw`	`RawGpsData`	Raw GPS tables loaded with `load_gps_export` or `load_gps_sources`.	required
`config`	`ProjectConfig`	Project configuration controlling mappings, phases, thresholds and CRS.	required
`sociodemo`	`DataFrame \| None`	Optional user-level side table with a `user_id` column.	`None`
`weights`	`DataFrame \| None`	Optional user-level weighting table with `user_id` and `weight_col`.	`None`
`weight_col`	`str`	Name of the weight column to merge/create.	`'weight'`
`default_weight`	`float`	Weight used when no weighting table is provided, or when a user has no matching weight.	`1.0`
`resample_missing_days`	`bool`	When true, insert transparent `Resampled_stay` rows for missing user-days.	`False`
`clean_leg_geometries`	`bool`	When true, convert continuous `MultiLineString` legs to `LineString` and drop discontinuous geometries.	`True`
`add_length_outlier_flags`	`bool`	When true, add per-mode length quantile flags to legs.	`True`
`add_signal_quality_flags`	`bool`	When true, compute GPS signal-loss metrics on legs and user-level signal-quality flags.	`True`
`validation`	`dict[str, SchemaValidationResult] \| None`	Optional validation reports. If absent, reports attached to `raw` are reused or recomputed.	`None`

Returns:

Type	Description
`MobilityDataset`	A `MobilityDataset` containing parsed storyline, staypoints, legs,
`MobilityDataset`	trips, journeys, user stats and mapping tables.

Raises:

Type	Description
`ValueError`	If raw schema validation contains blocking errors.

`concat_mobility_datasets(datasets)` ¶

Concatenate transformed mobility datasets from several sources.

Parameters:

Name	Type	Description	Default
`datasets`	`Iterable[MobilityDataset]`	Already transformed datasets, usually one per project or period.	required

Returns:

Type	Description
`MobilityDataset`	One `MobilityDataset` with concatenated tables and namespaced
`MobilityDataset`	validation reports.

Raises:

Type	Description
`ValueError`	If no dataset is provided.

`prepare_mobility_datasets(configs, *, sociodemo_by_source=None, weights_by_source=None, weight_col='weight', default_weight=1.0, sample=None, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validate=True, must_exist=True, namespace_ids=True)` ¶

Load, transform and concatenate several GPS sources or periods.

Parameters:

Name	Type	Description	Default
`configs`	`Iterable[ProjectConfig]`	Project configurations, one per project or period.	required
`sociodemo_by_source`	`DataFrame \| Mapping[str, DataFrame] \| None`	Optional sociodemographic table or mapping from source id/project name to a sociodemographic table.	`None`
`weights_by_source`	`DataFrame \| Mapping[str, DataFrame] \| None`	Optional weighting table or mapping from source id/project name to a user-level weighting table.	`None`
`weight_col`	`str`	Name of the weight column to merge/create.	`'weight'`
`default_weight`	`float`	Weight used when no weighting table is provided, or when a user has no matching weight.	`1.0`
`sample`	`RawSampleConfig \| None`	Optional raw sampling strategy applied to each source.	`None`
`resample_missing_days`	`bool`	Whether to add transparent missing-day stays.	`False`
`clean_leg_geometries`	`bool`	Whether to normalize/drop problematic leg geometries before length and quality calculations.	`True`
`add_length_outlier_flags`	`bool`	Whether to add per-mode length outlier flags.	`True`
`add_signal_quality_flags`	`bool`	Whether to compute GPS signal-loss flags.	`True`
`validate`	`bool`	Whether to validate raw exports before transformation.	`True`
`must_exist`	`bool`	Whether expected raw CSV files must exist.	`True`
`namespace_ids`	`bool`	Whether to prefix ids by source id before concatenation.	`True`

Returns:

Type	Description
`MobilityDataset`	A concatenated `MobilityDataset`.

Parsing¶

`xyt_gps.parsing` ¶

Parsing helpers for structured GPS exports.

`drop_nans_if_low_rate(df, column, *, threshold=0.01)` ¶

Drop nulls only when the null rate is explicitly below the threshold.

`parse_ewkb(value)` ¶

Parse an EWKB hex string into a shapely geometry.

`parse_date_columns(df, columns, *, utc=True)` ¶

Parse existing date columns and leave absent columns untouched.

`assign_phase(value, phases, *, default='Other')` ¶

Assign a date or timestamp to the configured experimental phase.

Préparation des tables¶

`xyt_gps.prepare_tables` ¶

Prepare raw GPS tables before mobility object construction.

`apply_storyline_mappings(storyline, mappings=None)` ¶

Add purpose and mode aggregation columns to storyline rows.

`apply_trip_journey_mappings(trips, journeys, mappings=None)` ¶

Add purpose and mode aggregation columns to trips and journeys.

`prepare_storyline(storyline, config, *, drop_nan_threshold=0.01)` ¶

Validate, parse geometry/dates, assign phases and map modes/purposes.

`prepare_trips(trips, config)` ¶

Validate and parse raw trips.

`prepare_journeys(journeys, config)` ¶

Validate and parse raw journeys.

Tables de mobilité¶

`xyt_gps.mobility_tables` ¶

Small mobility-table transformations used by the preparation workflow.

`split_storyline(storyline)` ¶

Split a parsed storyline into staypoints and legs.

`add_user_id_day(legs)` ¶

Add the person-day id used by downstream indicators and notebooks.

`add_length_quantile_flags(legs, *, group_col='mode', length_col='length', quantiles=(0.98, 0.99))` ¶

Flag unusually long legs within each mode group.

Relations entre tables¶

`xyt_gps.relations` ¶

Relation tables linking legs, staypoints, trips and journeys.

`build_track_trip_journey_map(legs, trips, journeys, *, tolerance='5s')` ¶

Map legs to trips and journeys.

If legs already contain a valid trip_id, the function uses it directly. Otherwise it matches each leg midpoint to trips from the same user, then matches each trip midpoint to journeys from the same user. Temporal joins are vectorized by user to avoid per-leg nested loops.

`build_legs_staypoints_map(legs, staypoints, *, tolerance='5s')` ¶

Map each staypoint to previous and next legs using near-identical timestamps.

`add_journey_to_trips(trips, journeys, mapping)` ¶

Add journey_id and journey purpose to trips.

`add_trip_destination_activity(trips, map_track_trip_journey, map_legs_staypoints)` ¶

Add leading activity id to trips using the last leg of each trip.

`add_excursion_flags_to_trips_journeys(trips, journeys, legs, map_track_trip_journey, *, excursion_col='excursion')` ¶

Propagate leg-level excursion flags to trips and journeys.

Statistiques utilisateurs¶

`xyt_gps.user_stats` ¶

User-level statistics derived from prepared GPS tables.

`add_excursion_stats_to_user_stats(user_stats, legs, *, excursion_col='excursion')` ¶

Merge user-level excursion counts from leg-level flags.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table containing `user_id`.	required
`legs`	`DataFrame`	Leg table containing `user_id` and an excursion flag column.	required
`excursion_col`	`str`	Name of the leg-level excursion flag.	`'excursion'`

Returns:

Type	Description
`DataFrame`	A copy of `user_stats` with `excursion_legs_count`,
`DataFrame`	`total_legs_count` and `excursion_leg_ratio`.

`build_user_stats(storyline, config, *, user_statistics=None, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0)` ¶

Build user-level tracking stats and merge optional side tables.

The output keeps the GPS database readable at user level. When the storyline contains artificial Resampled_stay rows, the table reports both the continuous record (active_days_count) and the observed tracking days before resampling (observed_active_days_count). If no weights are provided, weight_col is set to default_weight.

Qualité du suivi¶

`xyt_gps.quality` ¶

Quality public facade for the data preparation workflow.

`build_daily_tracking_presence(storyline, config=None, *, user_id_column='user_id', date_column='started_at', exclude_resampled=True)` ¶

Build one observed tracking row per user and date.

This table is intentionally based on observed rows before temporal resampling. Otherwise, artificial Resampled_stay rows would make missing days look like tracked days in participation analyses.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`config`	`ProjectConfig \| None`	Optional project configuration used to assign phases.	`None`
`user_id_column`	`str`	User identifier column.	`'user_id'`
`date_column`	`str`	Datetime column used to define active tracking days.	`'started_at'`
`exclude_resampled`	`bool`	When true, rows where `type == "Resampled_stay"` are ignored.	`True`

Returns:

Type	Description
`DataFrame`	A table with `user_id`, `tracking_date`, `active_day` and, when a
`DataFrame`	config with phases is provided, `Phase`.

Raises:

Type	Description
`KeyError`	If the requested user or date column is missing.

`build_weekly_participation_grid(storyline, config, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')` ¶

Count observed active tracking days by user and protocol week.

When phases are configured, weeks are relative to these phases rather than ISO calendar weeks. Without configured phases, the grid uses one analytical period named default_period_name and spans the observed tracking dates.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table, preferably before resampling.	required
`config`	`ProjectConfig`	Project configuration containing phase dates.	required
`user_ids`	`Iterable \| None`	Optional full list of expected users. If omitted, users are taken from `storyline`.	`None`
`user_id_column`	`str`	User identifier column in `storyline`.	`'user_id'`
`date_column`	`str`	Datetime column used to define active tracking days.	`'started_at'`
`exclude_resampled`	`bool`	When true, artificial `Resampled_stay` rows are ignored.	`True`
`default_period_name`	`str`	Name used in the output when no phase is configured.	`'All'`

Returns:

Type	Description
`DataFrame`	A complete user x phase-week table. `active_days_count` and
`DataFrame`	`participation_score` are integers from 0 to 7.

Raises:

Type	Description
`ValueError`	If no phase is configured and no tracking date is available to infer a default analysis period.

`calculate_user_tracking_stats(storyline)` ¶

Calculate observed tracking windows and missing days per user.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table. It must contain `user_id` and `started_at`. The function expects one row per stay or track segment.	required

Returns:

Type	Description
`DataFrame`	A user-level table with first and latest tracked dates, the number
`DataFrame`	of active days, inactive days, maximum gap between tracked days and
`DataFrame`	tracking completeness.

Raises:

Type	Description
`KeyError`	If `user_id` or `started_at` is missing.

`summarize_participation_grid(participation_grid, *, good_week_min_days=5)` ¶

Summarize weekly participation coverage.

Parameters:

Name	Type	Description	Default
`participation_grid`	`DataFrame`	Table produced by `build_weekly_participation_grid`.	required
`good_week_min_days`	`int`	Minimum active days for a week to be considered good or complete.	`5`

Returns:

Type	Description
`DataFrame`	One summary row for all phases and one row per phase.

`build_tracking_gap_report(storyline, config=None, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')` ¶

Summarize observed, missing and consecutive tracking days.

This is the package version of the useful generic part of the historical tracking-quality notebook: one row per user and analytical period, with active days, missing days, maximum gap and maximum consecutive observed days.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`config`	`ProjectConfig \| None`	Optional project configuration. When phases are configured, the report is computed by phase; otherwise it uses the observed period named `default_period_name`.	`None`
`user_ids`	`Iterable \| None`	Optional full list of expected users.	`None`
`user_id_column`	`str`	User identifier column in `storyline`.	`'user_id'`
`date_column`	`str`	Datetime column used to define active tracking days.	`'started_at'`
`exclude_resampled`	`bool`	Whether to ignore artificial `Resampled_stay` rows.	`True`
`default_period_name`	`str`	Period name used when no phase is configured.	`'All'`

Returns:

Type	Description
`DataFrame`	A user-period table with tracking coverage and gap metrics.

`build_tracking_quality_report(user_stats)` ¶

Build a compact one-row report for user tracking quality.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table with tracking-quality columns.	required

Returns:

Type	Description
`DataFrame`	A one-row table with user counts, valid-user share and median
`DataFrame`	tracking-duration indicators. The output is meant for quick notebook
`DataFrame`	checks before applying filters.

`calculate_tracking_periods(user_stats, config, *, min_days_by_phase=None)` ¶

Calculate effective tracked days per configured phase.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level tracking table, usually produced by `calculate_user_tracking_stats`.	required
`config`	`ProjectConfig`	Project configuration containing `Phase` definitions and tracking thresholds.	required
`min_days_by_phase`	`Mapping[str, int] \| None`	Optional override for minimum tracked days by phase name.	`None`

Returns:

Type	Description
`DataFrame`	A copy of `user_stats` with `n_days_phase_`, `phase_start` and
`DataFrame`	`phase*_end` columns. If no phase is configured, the input schema is
`DataFrame`	preserved.

`flag_tracking_quality(user_stats, config)` ¶

Add transparent tracking-quality flags to a user table.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table with at least `active_days_count`.	required
`config`	`ProjectConfig`	Project configuration containing total and phase-level tracking thresholds.	required

Returns:

Type	Description
`DataFrame`	A copy of `user_stats` with boolean flags such as
`DataFrame`	`tracked_days_ok`, `phase*_tracked_days_ok`, `tracking_quality_ok`
`DataFrame`	and the categorical reason `tracking_quality_reason`.

Raises:

Type	Description
`KeyError`	If `active_days_count` is missing.

`summarize_phase_tracking(user_stats, config, *, day_thresholds=(7, 14, 21))` ¶

Summarize user tracking coverage by configured phase.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table containing `n_days_phase_1`, `n_days_phase_2`, etc.	required
`config`	`ProjectConfig`	Project configuration containing phase dates.	required
`day_thresholds`	`Iterable[int]`	Day-count thresholds to report, for example 7, 14 and 21 days.	`(7, 14, 21)`

Returns:

Type	Description
`DataFrame`	One row per phase with user counts above each threshold and mean
`DataFrame`	coverage over the theoretical phase duration.

`build_mode_detection_precision(storyline, *, mode_col='mode', detected_mode_col='detected_mode', type_col='type', confirmed_at_col='confirmed_at', confirmed_only=True, group_cols=('mode', 'mode_niv1', 'mode_mrmt'))` ¶

Compare detected and confirmed transport modes.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`mode_col`	`str`	Confirmed or corrected mode column.	`'mode'`
`detected_mode_col`	`str`	Detected mode column.	`'detected_mode'`
`type_col`	`str`	Column used to keep only track rows.	`'type'`
`confirmed_at_col`	`str`	Confirmation timestamp column.	`'confirmed_at'`
`confirmed_only`	`bool`	When true, keep only rows with a confirmation timestamp when the column exists.	`True`
`group_cols`	`Iterable[str]`	Mode columns to summarize when present.	`('mode', 'mode_niv1', 'mode_mrmt')`

Returns:

Type	Description
`DataFrame`	A long table with one precision row per available grouping column and
`DataFrame`	label.

`build_user_confirmation_rates(storyline, *, user_id_col='user_id', type_col='type', confirmed_at_col='confirmed_at')` ¶

Calculate user-level confirmation rates for stays and tracks.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`user_id_col`	`str`	User identifier column.	`'user_id'`
`type_col`	`str`	Column distinguishing `Stay` and `Track`.	`'type'`
`confirmed_at_col`	`str`	Confirmation timestamp column.	`'confirmed_at'`

Returns:

Type	Description
`DataFrame`	One row per user with stay and track confirmation counts and rates.

`get_extreme_legs_by_mode(legs, *, mode_col='mode_niv1', length_col='length', duration_col='duration', top_n=5)` ¶

Return the longest legs within each mode.

Parameters:

Name	Type	Description	Default
`legs`	`DataFrame`	Leg table.	required
`mode_col`	`str`	Mode column used for grouping.	`'mode_niv1'`
`length_col`	`str`	Length column, expected in meters.	`'length'`
`duration_col`	`str`	Optional duration column in seconds.	`'duration'`
`top_n`	`int`	Number of longest legs retained per mode.	`5`

Returns:

Type	Description
`DataFrame`	A table sorted by mode and descending distance, with distance in
`DataFrame`	kilometers and speed when duration is available.

`summarize_leg_lengths_by_mode(legs, *, mode_col='mode_niv2', length_col='length', quantiles=(0.95, 0.98, 0.99))` ¶

Summarize leg-distance distributions by mode.

Parameters:

Name	Type	Description	Default
`legs`	`DataFrame`	Leg table.	required
`mode_col`	`str`	Mode column used for grouping.	`'mode_niv2'`
`length_col`	`str`	Length column, expected in meters.	`'length'`
`quantiles`	`Iterable[float]`	Quantiles to add to the summary.	`(0.95, 0.98, 0.99)`

Returns:

Type	Description
`DataFrame`	One row per mode with distances expressed in kilometers.

`resample_missing_stays(storyline, config)` ¶

Create transparent placeholder stays for missing tracking dates.

The generated rows are labelled Resampled_stay. They are useful when the analysis requires a continuous user calendar, but they remain identifiable through their type and comment_feedback values.

Parameters:

Name	Type	Description	Default
`storyline`	`GeoDataFrame`	Parsed storyline table with `user_id`, `started_at` and geometry columns.	required
`config`	`ProjectConfig`	Project configuration used to assign experimental phases to inserted days.	required

Returns:

Type	Description
`GeoDataFrame`	A tuple `(storyline, missing_days)` where `storyline` includes the
`DataFrame`	optional placeholder rows and `missing_days` lists one row per
`tuple[GeoDataFrame, DataFrame]`	inserted user-day.

`build_user_selection_table(user_stats, *, require_tracking_quality=False, exclude_bad_signal_users=True, max_low_quality_legs_share=None)` ¶

Build an explicit user-level table for selecting analysis users.

The function does not remove rows. It adds one boolean column per rule, a final analysis_user_ok flag and an analysis_user_reason column. This keeps exclusions inspectable before any table is filtered. The name deliberately uses selection_table rather than filter_matrix because the object is meant to be read by humans before being applied.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table produced by `prepare_mobility_dataset`. Expected columns depend on enabled rules: `tracking_quality_ok`, `bad_signal_user` and `low_quality_legs_share`.	required
`require_tracking_quality`	`bool`	When true, keep only users with `tracking_quality_ok == True`. Leave false when tracking quality is used as a diagnostic rather than an exclusion rule, for example when the current dump does not cover all configured phases.	`False`
`exclude_bad_signal_users`	`bool`	When true, exclude users flagged by the signal-quality step.	`True`
`max_low_quality_legs_share`	`float \| None`	Optional maximum share of level-1 low-quality legs tolerated per user.	`None`

Returns:

Type	Description
`DataFrame`	A copy of `user_stats` with `*_filter_ok`, `analysis_user_ok` and
`DataFrame`	`analysis_user_reason` columns.

Raises:

Type	Description
`KeyError`	If a column required by an enabled rule is missing.
`ValueError`	If `max_low_quality_legs_share` is outside `[0, 1]`.

`filter_mobility_dataset_by_users(dataset, user_ids)` ¶

Filter every user-indexed table of a MobilityDataset.

Mapping tables are filtered after the core tables so they only reference retained legs, trips, journeys and staypoints. The function does not recompute indicators; it only preserves relational consistency after a user selection.

Parameters:

Name	Type	Description	Default
`dataset`	`MobilityDataset`	Transformed mobility dataset.	required
`user_ids`	`Iterable`	Iterable of user identifiers to keep.	required

Returns:

Type	Description
`MobilityDataset`	A new `MobilityDataset` with filtered tables and unchanged validation
`MobilityDataset`	reports.

`filter_table_by_users(df, user_ids, *, user_id_column='user_id')` ¶

Filter any mobility table by selected users while keeping its schema.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Table to filter. If `user_id_column` is absent, the table is returned unchanged.	required
`user_ids`	`Iterable`	Iterable of user identifiers to keep.	required
`user_id_column`	`str`	Name of the user identifier column in `df`.	`'user_id'`

Returns:

Type	Description
`DataFrame`	A copy of `df` restricted to selected users, with the index reset.

`select_analysis_users(selection_table, *, quality_column='analysis_user_ok')` ¶

Return user ids selected by a user selection table.

Parameters:

Name	Type	Description	Default
`selection_table`	`DataFrame`	Table produced by `build_user_selection_table`.	required
`quality_column`	`str`	Boolean column used as the final selection flag.	`'analysis_user_ok'`

Returns:

Type	Description
`Index`	User ids for which `quality_column` is true.

Raises:

Type	Description
`KeyError`	If `user_id` or `quality_column` is missing.

`select_valid_tracking_users(user_stats, *, quality_column='tracking_quality_ok')` ¶

Return user ids that pass one quality column.

This helper is intentionally narrow and remains useful for quick checks. For analysis filters that combine tracking quality and GPS signal flags, prefer build_user_selection_table followed by select_analysis_users.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table containing `user_id` and the requested quality column.	required
`quality_column`	`str`	Boolean column used as the selection criterion.	`'tracking_quality_ok'`

Returns:

Type	Description
`Index`	User ids for which `quality_column` is true.

Raises:

Type	Description
`KeyError`	If the requested quality column is missing.

Présence et participation¶

`xyt_gps.quality_tracking` ¶

Daily presence and weekly participation helpers.

`calculate_user_tracking_stats(storyline)` ¶

Calculate observed tracking windows and missing days per user.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table. It must contain `user_id` and `started_at`. The function expects one row per stay or track segment.	required

Returns:

Type	Description
`DataFrame`	A user-level table with first and latest tracked dates, the number
`DataFrame`	of active days, inactive days, maximum gap between tracked days and
`DataFrame`	tracking completeness.

Raises:

Type	Description
`KeyError`	If `user_id` or `started_at` is missing.

`build_daily_tracking_presence(storyline, config=None, *, user_id_column='user_id', date_column='started_at', exclude_resampled=True)` ¶

Build one observed tracking row per user and date.

This table is intentionally based on observed rows before temporal resampling. Otherwise, artificial Resampled_stay rows would make missing days look like tracked days in participation analyses.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`config`	`ProjectConfig \| None`	Optional project configuration used to assign phases.	`None`
`user_id_column`	`str`	User identifier column.	`'user_id'`
`date_column`	`str`	Datetime column used to define active tracking days.	`'started_at'`
`exclude_resampled`	`bool`	When true, rows where `type == "Resampled_stay"` are ignored.	`True`

Returns:

Type	Description
`DataFrame`	A table with `user_id`, `tracking_date`, `active_day` and, when a
`DataFrame`	config with phases is provided, `Phase`.

Raises:

Type	Description
`KeyError`	If the requested user or date column is missing.

`build_weekly_participation_grid(storyline, config, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')` ¶

Count observed active tracking days by user and protocol week.

When phases are configured, weeks are relative to these phases rather than ISO calendar weeks. Without configured phases, the grid uses one analytical period named default_period_name and spans the observed tracking dates.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table, preferably before resampling.	required
`config`	`ProjectConfig`	Project configuration containing phase dates.	required
`user_ids`	`Iterable \| None`	Optional full list of expected users. If omitted, users are taken from `storyline`.	`None`
`user_id_column`	`str`	User identifier column in `storyline`.	`'user_id'`
`date_column`	`str`	Datetime column used to define active tracking days.	`'started_at'`
`exclude_resampled`	`bool`	When true, artificial `Resampled_stay` rows are ignored.	`True`
`default_period_name`	`str`	Name used in the output when no phase is configured.	`'All'`

Returns:

Type	Description
`DataFrame`	A complete user x phase-week table. `active_days_count` and
`DataFrame`	`participation_score` are integers from 0 to 7.

Raises:

Type	Description
`ValueError`	If no phase is configured and no tracking date is available to infer a default analysis period.

`summarize_participation_grid(participation_grid, *, good_week_min_days=5)` ¶

Summarize weekly participation coverage.

Parameters:

Name	Type	Description	Default
`participation_grid`	`DataFrame`	Table produced by `build_weekly_participation_grid`.	required
`good_week_min_days`	`int`	Minimum active days for a week to be considered good or complete.	`5`

Returns:

Type	Description
`DataFrame`	One summary row for all phases and one row per phase.

Rapports qualité¶

`xyt_gps.quality_reports` ¶

Tracking quality reports and user-level tracking flags.

`build_tracking_gap_report(storyline, config=None, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')` ¶

Summarize observed, missing and consecutive tracking days.

This is the package version of the useful generic part of the historical tracking-quality notebook: one row per user and analytical period, with active days, missing days, maximum gap and maximum consecutive observed days.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`config`	`ProjectConfig \| None`	Optional project configuration. When phases are configured, the report is computed by phase; otherwise it uses the observed period named `default_period_name`.	`None`
`user_ids`	`Iterable \| None`	Optional full list of expected users.	`None`
`user_id_column`	`str`	User identifier column in `storyline`.	`'user_id'`
`date_column`	`str`	Datetime column used to define active tracking days.	`'started_at'`
`exclude_resampled`	`bool`	Whether to ignore artificial `Resampled_stay` rows.	`True`
`default_period_name`	`str`	Period name used when no phase is configured.	`'All'`

Returns:

Type	Description
`DataFrame`	A user-period table with tracking coverage and gap metrics.

`summarize_phase_tracking(user_stats, config, *, day_thresholds=(7, 14, 21))` ¶

Summarize user tracking coverage by configured phase.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table containing `n_days_phase_1`, `n_days_phase_2`, etc.	required
`config`	`ProjectConfig`	Project configuration containing phase dates.	required
`day_thresholds`	`Iterable[int]`	Day-count thresholds to report, for example 7, 14 and 21 days.	`(7, 14, 21)`

Returns:

Type	Description
`DataFrame`	One row per phase with user counts above each threshold and mean
`DataFrame`	coverage over the theoretical phase duration.

`calculate_tracking_periods(user_stats, config, *, min_days_by_phase=None)` ¶

Calculate effective tracked days per configured phase.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level tracking table, usually produced by `calculate_user_tracking_stats`.	required
`config`	`ProjectConfig`	Project configuration containing `Phase` definitions and tracking thresholds.	required
`min_days_by_phase`	`Mapping[str, int] \| None`	Optional override for minimum tracked days by phase name.	`None`

Returns:

Type	Description
`DataFrame`	A copy of `user_stats` with `n_days_phase_`, `phase_start` and
`DataFrame`	`phase*_end` columns. If no phase is configured, the input schema is
`DataFrame`	preserved.

`flag_tracking_quality(user_stats, config)` ¶

Add transparent tracking-quality flags to a user table.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table with at least `active_days_count`.	required
`config`	`ProjectConfig`	Project configuration containing total and phase-level tracking thresholds.	required

Returns:

Type	Description
`DataFrame`	A copy of `user_stats` with boolean flags such as
`DataFrame`	`tracked_days_ok`, `phase*_tracked_days_ok`, `tracking_quality_ok`
`DataFrame`	and the categorical reason `tracking_quality_reason`.

Raises:

Type	Description
`KeyError`	If `active_days_count` is missing.

`build_tracking_quality_report(user_stats)` ¶

Build a compact one-row report for user tracking quality.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table with tracking-quality columns.	required

Returns:

Type	Description
`DataFrame`	A one-row table with user counts, valid-user share and median
`DataFrame`	tracking-duration indicators. The output is meant for quick notebook
`DataFrame`	checks before applying filters.

Diagnostics¶

`xyt_gps.quality_diagnostics` ¶

Diagnostic reports for leg lengths, confirmations and mode detection.

`summarize_leg_lengths_by_mode(legs, *, mode_col='mode_niv2', length_col='length', quantiles=(0.95, 0.98, 0.99))` ¶

Summarize leg-distance distributions by mode.

Parameters:

Name	Type	Description	Default
`legs`	`DataFrame`	Leg table.	required
`mode_col`	`str`	Mode column used for grouping.	`'mode_niv2'`
`length_col`	`str`	Length column, expected in meters.	`'length'`
`quantiles`	`Iterable[float]`	Quantiles to add to the summary.	`(0.95, 0.98, 0.99)`

Returns:

Type	Description
`DataFrame`	One row per mode with distances expressed in kilometers.

`get_extreme_legs_by_mode(legs, *, mode_col='mode_niv1', length_col='length', duration_col='duration', top_n=5)` ¶

Return the longest legs within each mode.

Parameters:

Name	Type	Description	Default
`legs`	`DataFrame`	Leg table.	required
`mode_col`	`str`	Mode column used for grouping.	`'mode_niv1'`
`length_col`	`str`	Length column, expected in meters.	`'length'`
`duration_col`	`str`	Optional duration column in seconds.	`'duration'`
`top_n`	`int`	Number of longest legs retained per mode.	`5`

Returns:

Type	Description
`DataFrame`	A table sorted by mode and descending distance, with distance in
`DataFrame`	kilometers and speed when duration is available.

`build_user_confirmation_rates(storyline, *, user_id_col='user_id', type_col='type', confirmed_at_col='confirmed_at')` ¶

Calculate user-level confirmation rates for stays and tracks.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`user_id_col`	`str`	User identifier column.	`'user_id'`
`type_col`	`str`	Column distinguishing `Stay` and `Track`.	`'type'`
`confirmed_at_col`	`str`	Confirmation timestamp column.	`'confirmed_at'`

Returns:

Type	Description
`DataFrame`	One row per user with stay and track confirmation counts and rates.

`build_mode_detection_precision(storyline, *, mode_col='mode', detected_mode_col='detected_mode', type_col='type', confirmed_at_col='confirmed_at', confirmed_only=True, group_cols=('mode', 'mode_niv1', 'mode_mrmt'))` ¶

Compare detected and confirmed transport modes.

Parameters:

Name	Type	Description	Default
`storyline`	`DataFrame`	Parsed storyline table.	required
`mode_col`	`str`	Confirmed or corrected mode column.	`'mode'`
`detected_mode_col`	`str`	Detected mode column.	`'detected_mode'`
`type_col`	`str`	Column used to keep only track rows.	`'type'`
`confirmed_at_col`	`str`	Confirmation timestamp column.	`'confirmed_at'`
`confirmed_only`	`bool`	When true, keep only rows with a confirmation timestamp when the column exists.	`True`
`group_cols`	`Iterable[str]`	Mode columns to summarize when present.	`('mode', 'mode_niv1', 'mode_mrmt')`

Returns:

Type	Description
`DataFrame`	A long table with one precision row per available grouping column and
`DataFrame`	label.

Resampling temporel¶

`xyt_gps.quality_resampling` ¶

Temporal resampling helpers for missing tracking days.

`resample_missing_stays(storyline, config)` ¶

Create transparent placeholder stays for missing tracking dates.

The generated rows are labelled Resampled_stay. They are useful when the analysis requires a continuous user calendar, but they remain identifiable through their type and comment_feedback values.

Parameters:

Name	Type	Description	Default
`storyline`	`GeoDataFrame`	Parsed storyline table with `user_id`, `started_at` and geometry columns.	required
`config`	`ProjectConfig`	Project configuration used to assign experimental phases to inserted days.	required

Returns:

Type	Description
`GeoDataFrame`	A tuple `(storyline, missing_days)` where `storyline` includes the
`DataFrame`	optional placeholder rows and `missing_days` lists one row per
`tuple[GeoDataFrame, DataFrame]`	inserted user-day.

Sélection utilisateurs¶

`xyt_gps.quality_selection` ¶

User selection and consistent filtering of mobility datasets.

`select_valid_tracking_users(user_stats, *, quality_column='tracking_quality_ok')` ¶

Return user ids that pass one quality column.

This helper is intentionally narrow and remains useful for quick checks. For analysis filters that combine tracking quality and GPS signal flags, prefer build_user_selection_table followed by select_analysis_users.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table containing `user_id` and the requested quality column.	required
`quality_column`	`str`	Boolean column used as the selection criterion.	`'tracking_quality_ok'`

Returns:

Type	Description
`Index`	User ids for which `quality_column` is true.

Raises:

Type	Description
`KeyError`	If the requested quality column is missing.

`filter_table_by_users(df, user_ids, *, user_id_column='user_id')` ¶

Filter any mobility table by selected users while keeping its schema.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Table to filter. If `user_id_column` is absent, the table is returned unchanged.	required
`user_ids`	`Iterable`	Iterable of user identifiers to keep.	required
`user_id_column`	`str`	Name of the user identifier column in `df`.	`'user_id'`

Returns:

Type	Description
`DataFrame`	A copy of `df` restricted to selected users, with the index reset.

`build_user_selection_table(user_stats, *, require_tracking_quality=False, exclude_bad_signal_users=True, max_low_quality_legs_share=None)` ¶

Build an explicit user-level table for selecting analysis users.

The function does not remove rows. It adds one boolean column per rule, a final analysis_user_ok flag and an analysis_user_reason column. This keeps exclusions inspectable before any table is filtered. The name deliberately uses selection_table rather than filter_matrix because the object is meant to be read by humans before being applied.

Parameters:

Name	Type	Description	Default
`user_stats`	`DataFrame`	User-level table produced by `prepare_mobility_dataset`. Expected columns depend on enabled rules: `tracking_quality_ok`, `bad_signal_user` and `low_quality_legs_share`.	required
`require_tracking_quality`	`bool`	When true, keep only users with `tracking_quality_ok == True`. Leave false when tracking quality is used as a diagnostic rather than an exclusion rule, for example when the current dump does not cover all configured phases.	`False`
`exclude_bad_signal_users`	`bool`	When true, exclude users flagged by the signal-quality step.	`True`
`max_low_quality_legs_share`	`float \| None`	Optional maximum share of level-1 low-quality legs tolerated per user.	`None`

Returns:

Type	Description
`DataFrame`	A copy of `user_stats` with `*_filter_ok`, `analysis_user_ok` and
`DataFrame`	`analysis_user_reason` columns.

Raises:

Type	Description
`KeyError`	If a column required by an enabled rule is missing.
`ValueError`	If `max_low_quality_legs_share` is outside `[0, 1]`.

`select_analysis_users(selection_table, *, quality_column='analysis_user_ok')` ¶

Return user ids selected by a user selection table.

Parameters:

Name	Type	Description	Default
`selection_table`	`DataFrame`	Table produced by `build_user_selection_table`.	required
`quality_column`	`str`	Boolean column used as the final selection flag.	`'analysis_user_ok'`

Returns:

Type	Description
`Index`	User ids for which `quality_column` is true.

Raises:

Type	Description
`KeyError`	If `user_id` or `quality_column` is missing.

`filter_mobility_dataset_by_users(dataset, user_ids)` ¶

Filter every user-indexed table of a MobilityDataset.

Mapping tables are filtered after the core tables so they only reference retained legs, trips, journeys and staypoints. The function does not recompute indicators; it only preserves relational consistency after a user selection.

Parameters:

Name	Type	Description	Default
`dataset`	`MobilityDataset`	Transformed mobility dataset.	required
`user_ids`	`Iterable`	Iterable of user identifiers to keep.	required

Returns:

Type	Description
`MobilityDataset`	A new `MobilityDataset` with filtered tables and unchanged validation
`MobilityDataset`	reports.

Spatial¶

`xyt_gps.spatial` ¶

Spatial façade for geometry, GPS signal quality and zone helpers.

`clean_leg_geometries(legs, *, drop_discontinuous=True)` ¶

Convert continuous MultiLineString legs to LineString.

`max_consecutive_point_distance(geometry)` ¶

Return the maximum distance between consecutive points.

`add_signal_loss_metrics(legs, config, *, geometry_col='geometry')` ¶

Add absolute and relative GPS signal-loss metrics to legs.

`add_signal_quality_flags(legs, config, *, mode_col=None)` ¶

Add signal-loss metrics, leg flags and user signal-quality flags.

`add_signal_quality_to_user_stats(user_stats, legs, *, signal_quality_computed=None)` ¶

Merge user-level signal-quality metrics into user_stats.

`build_user_signal_quality_stats(legs)` ¶

Aggregate signal-quality flags at user level.

`flag_low_quality_legs_by_mode(legs, config, *, mode_col=None)` ¶

Flag legs with mode-specific signal-loss thresholds.

`identify_bad_signal_users(legs, *, quantile_threshold=0.995, user_id_col='user_id')` ¶

Identify users with unusually high average signal loss.

`add_excursion_flags(table, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', flag_col='excursion', mode='outside', target_crs='EPSG:4326', operations_crs='EPSG:2056')` ¶

Flag geographic excursions on a geometry table.

`add_journey_origin_destination_from_trips(journeys, trips, map_track_trip_journey, *, trip_origin_col='trip_origin_zone', trip_destination_col='trip_destination_zone', output_origin_col='journey_origin_zone', output_destination_col='journey_destination_zone', fill_value='Unknown')` ¶

Propagate first-origin and last-destination labels from trips to journeys.

`add_leg_origin_destination_zones(legs, zones, *, zone_label_col, origin_col='origin_zone', destination_col='destination_zone', geometry_col='geometry', fill_value='Outside', target_crs='EPSG:4326')` ¶

Label leg origins and destinations from a zone layer.

`add_spatial_zone_labels(table, zones, *, zone_label_col, output_col, geometry_col='geometry', predicate='within', fill_value='Unknown', target_crs='EPSG:4326')` ¶

Add a zone label to a geometry table by spatial join.

`add_trip_origin_destination_from_legs(trips, legs, map_track_trip_journey, *, leg_origin_col='origin_zone', leg_destination_col='destination_zone', output_origin_col='trip_origin_zone', output_destination_col='trip_destination_zone', fill_value='Unknown')` ¶

Propagate first-origin and last-destination labels from legs to trips.

`classify_leg_relation_to_area(legs, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', relation_col='area_relation', code_col=None, target_crs='EPSG:4326', operations_crs='EPSG:2056')` ¶

Classify each leg as intra, extra or exchange relative to an area.

Géométries spatiales¶

`xyt_gps.spatial_geometry` ¶

Geometry helpers for mobility traces.

`clean_leg_geometries(legs, *, drop_discontinuous=True)` ¶

Convert continuous MultiLineString legs to LineString.

`max_consecutive_point_distance(geometry)` ¶

Return the maximum distance between consecutive points.

Qualité GPS spatiale¶

`xyt_gps.spatial_quality` ¶

GPS signal-quality metrics and filters.

`add_signal_loss_metrics(legs, config, *, geometry_col='geometry')` ¶

Add absolute and relative GPS signal-loss metrics to legs.

`flag_low_quality_legs_by_mode(legs, config, *, mode_col=None)` ¶

Flag legs with mode-specific signal-loss thresholds.

`identify_bad_signal_users(legs, *, quantile_threshold=0.995, user_id_col='user_id')` ¶

Identify users with unusually high average signal loss.

`add_signal_quality_flags(legs, config, *, mode_col=None)` ¶

Add signal-loss metrics, leg flags and user signal-quality flags.

`build_user_signal_quality_stats(legs)` ¶

Aggregate signal-quality flags at user level.

`add_signal_quality_to_user_stats(user_stats, legs, *, signal_quality_computed=None)` ¶

Merge user-level signal-quality metrics into user_stats.

Zones et relations spatiales¶

`xyt_gps.spatial_zones` ¶

Spatial labels, area relations and excursion flags.

`add_excursion_flags(table, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', flag_col='excursion', mode='outside', target_crs='EPSG:4326', operations_crs='EPSG:2056')` ¶

Flag geographic excursions on a geometry table.

`add_spatial_zone_labels(table, zones, *, zone_label_col, output_col, geometry_col='geometry', predicate='within', fill_value='Unknown', target_crs='EPSG:4326')` ¶

Add a zone label to a geometry table by spatial join.

`add_leg_origin_destination_zones(legs, zones, *, zone_label_col, origin_col='origin_zone', destination_col='destination_zone', geometry_col='geometry', fill_value='Outside', target_crs='EPSG:4326')` ¶

Label leg origins and destinations from a zone layer.

`classify_leg_relation_to_area(legs, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', relation_col='area_relation', code_col=None, target_crs='EPSG:4326', operations_crs='EPSG:2056')` ¶

Classify each leg as intra, extra or exchange relative to an area.

`add_trip_origin_destination_from_legs(trips, legs, map_track_trip_journey, *, leg_origin_col='origin_zone', leg_destination_col='destination_zone', output_origin_col='trip_origin_zone', output_destination_col='trip_destination_zone', fill_value='Unknown')` ¶

Propagate first-origin and last-destination labels from legs to trips.

`add_journey_origin_destination_from_trips(journeys, trips, map_track_trip_journey, *, trip_origin_col='trip_origin_zone', trip_destination_col='trip_destination_zone', output_origin_col='journey_origin_zone', output_destination_col='journey_destination_zone', fill_value='Unknown')` ¶

Propagate first-origin and last-destination labels from trips to journeys.

Tranches horaires¶

`xyt_gps.spatial_time` ¶

Reusable time-slice helpers for mobility tables.

`add_time_slices(table, *, time_slices=None, datetime_col='started_at', output_col='time_slice', timezone='Europe/Zurich', fallback_label='HC')` ¶

Add a reusable daily time-slice label to a mobility table.

Parameters:

Name	Type	Description	Default
`table`	`DataFrame`	Input table with a datetime column.	required
`time_slices`	`Iterable[TimeSlice] \| None`	Named intervals. Defaults to `HPM` 07:10-09:00 and `HPS` 17:30-20:00. Observations outside these intervals receive `fallback_label`, usually `HC`.	`None`
`datetime_col`	`str`	Datetime column used to classify observations.	`'started_at'`
`output_col`	`str`	Name of the created column.	`'time_slice'`
`timezone`	`str \| None`	Local timezone used before extracting the hour. Set to `None` to keep UTC timestamps.	`'Europe/Zurich'`
`fallback_label`	`str`	Label assigned outside configured intervals.	`'HC'`

Returns:

Type	Description
`DataFrame`	Copy of `table` with `output_col`.

Raises:

Type	Description
`KeyError`	If `datetime_col` is missing.

H3¶

`xyt_gps.spatial_h3` ¶

H3 point extraction and aggregation helpers.

`legs_to_h3_points(legs, *, h3_resolution=9, geometry_col='geometry', metadata_cols=None, sample_distance_m=None, max_points_per_leg=None)` ¶

Convert leg geometries into H3-indexed point observations.

The function creates one row per point extracted from each leg geometry. By default, points are the vertices already present in the leg LineString. When sample_distance_m is provided, the line is sampled at a regular interval before H3 indexing. This is useful for fréquentation maps, but it should be documented as a sampled representation rather than raw GPS observations.

Parameters:

Name	Type	Description	Default
`legs`	`DataFrame`	Leg table, usually `dataset.legs`, with line geometries.	required
`h3_resolution`	`int \| Iterable[int]`	H3 resolution or list of resolutions, from 0 to 15. Higher values produce smaller hexagons and larger output tables.	`9`
`geometry_col`	`str`	Geometry column containing `LineString` or `MultiLineString` objects.	`'geometry'`
`metadata_cols`	`Iterable[str] \| None`	Columns to copy from legs to each point row. When omitted, common mobility identifiers, dates, modes and phases are copied when present.	`None`
`sample_distance_m`	`float \| None`	Optional regular sampling distance in metres. If omitted, existing line vertices are used.	`None`
`max_points_per_leg`	`int \| None`	Optional cap after vertex extraction or regular sampling. This is a safeguard for very dense geometries.	`None`

Returns:

Type	Description
`GeoDataFrame`	GeoDataFrame in EPSG:4326 with `lon`, `lat`, `h3_cell`,
`GeoDataFrame`	`h3_resolution`, `point_sequence` and copied metadata columns.

Raises:

Type	Description
`ImportError`	If the optional `h3` dependency is missing.
`KeyError`	If the geometry column is missing.
`ValueError`	If the H3 resolution or sampling distance is invalid.

`aggregate_h3_frequencies(h3_points, *, h3_col='h3_cell', group_cols=None, user_col='user_id', leg_col='leg_id', trip_col='trip_id')` ¶

Aggregate H3-indexed points into fréquentation counts.

Parameters:

Name	Type	Description	Default
`h3_points`	`DataFrame`	Output of `legs_to_h3_points()` or another table containing an H3 cell column.	required
`h3_col`	`str`	Column containing H3 cell identifiers.	`'h3_cell'`
`group_cols`	`Iterable[str] \| None`	Additional grouping columns. The H3 cell is always kept. Add columns such as `mode_niv1`, `Phase` or `experiment_name` to build dashboard-ready slices.	`None`
`user_col`	`str`	User identifier column used for `user_count` when present.	`'user_id'`
`leg_col`	`str`	Leg identifier column used for `leg_count` when present.	`'leg_id'`
`trip_col`	`str`	Trip identifier column used for `trip_count` when present.	`'trip_id'`

Returns:

Type	Description
`DataFrame`	DataFrame with counts by H3 cell and optional grouping columns.

`build_h3_count_matrix(h3_points, *, h3_col='h3_cell', dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, metrics=DEFAULT_H3_COUNT_METRICS, user_col='user_id', leg_col='leg_id', trip_col='trip_id', fill_value=0)` ¶

Build a wide H3 count table for dashboards.

The output keeps one row per H3 cell and creates explicit metric columns for requested dimensions, for example point_count__mode_niv1__marche or trip_count__time_slice__hpm.

Parameters:

Name	Type	Description	Default
`h3_points`	`DataFrame`	H3-indexed points produced by `legs_to_h3_points()`.	required
`h3_col`	`str`	H3 cell column.	`'h3_cell'`
`dimension_sets`	`Iterable[Iterable[str]]`	Dimension sets used to create count columns. Each set is aggregated independently. Missing dimension columns are skipped.	`DEFAULT_H3_COUNT_DIMENSION_SETS`
`metrics`	`Iterable[str]`	Count metrics to compute: `point_count`, `user_count`, `leg_count`, `trip_count`.	`DEFAULT_H3_COUNT_METRICS`
`user_col`	`str`	User identifier column.	`'user_id'`
`leg_col`	`str`	Leg identifier column.	`'leg_id'`
`trip_col`	`str`	Trip identifier column.	`'trip_id'`
`fill_value`	`int`	Value used for absent combinations.	`0`

Returns:

Type	Description
`DataFrame`	Wide table keyed by `h3_resolution` when present and `h3_cell`.

Exports spatiaux dashboard¶

`xyt_gps.spatial_exports` ¶

Dashboard-oriented spatial table and DuckDB exports.

`build_spatial_analytics_tables(dataset_or_legs, *, h3_resolution=9, config=None, frequency_group_cols=None, count_dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, count_metrics=DEFAULT_H3_COUNT_METRICS, include_count_matrix=True, time_slices=None, datetime_col='started_at', time_slice_col='time_slice', timezone=None, sample_distance_m=None)` ¶

Build H3 spatial analytics tables without writing files.

Parameters:

Name	Type	Description	Default
`dataset_or_legs`	`MobilityDataset \| DataFrame`	A `MobilityDataset` or a leg table.	required
`h3_resolution`	`int \| Iterable[int]`	H3 resolution or list of resolutions used for point indexing.	`9`
`config`	`ProjectConfig \| None`	Optional project configuration. When provided, its timezone and time slices are used unless explicit values are passed.	`None`
`frequency_group_cols`	`Iterable[str] \| None`	Grouping columns for `aggregate_h3_frequencies`. Defaults to H3 cell only.	`None`
`count_dimension_sets`	`Iterable[Iterable[str]]`	Dimension sets used for the wide count matrix.	`DEFAULT_H3_COUNT_DIMENSION_SETS`
`count_metrics`	`Iterable[str]`	Count metrics included in the wide count matrix.	`DEFAULT_H3_COUNT_METRICS`
`include_count_matrix`	`bool`	Whether to export `h3_count_matrix`.	`True`
`time_slices`	`Iterable[TimeSlice] \| None`	Optional reusable daily intervals. Defaults to `config.time_slices` or package defaults.	`None`
`datetime_col`	`str`	Datetime column used to assign time slices.	`'started_at'`
`time_slice_col`	`str`	Name of the time-slice column.	`'time_slice'`
`timezone`	`str \| None`	Local timezone used for time slices. Defaults to `config.timezone` or `Europe/Zurich`.	`None`
`sample_distance_m`	`float \| None`	Optional regular sampling distance in metres before H3 indexing.	`None`

Returns:

Type	Description
`dict[str, DataFrame]`	Dictionary with `leg_points_h3`, `h3_frequency` and, when requested,
`dict[str, DataFrame]`	`h3_count_matrix`.

`write_spatial_analytics_tables(tables, output_dir, *, formats=('parquet', 'csv'), overwrite=True, manifest_name='spatial_analytics_manifest.json')` ¶

Write precomputed spatial analytics tables to disk.

Use this when tables have already been built with build_spatial_analytics_tables() and need to be inspected before export.

Parameters:

Name	Type	Description	Default
`tables`	`Mapping[str, DataFrame]`	Mapping of table names to DataFrames.	required
`output_dir`	`str \| Path`	Destination directory.	required
`formats`	`Iterable[str]`	Output formats: `parquet`, `csv`, `pickle`/`pkl`. Defaults to `parquet` and `csv`.	`('parquet', 'csv')`
`overwrite`	`bool`	If false, fail when a target file already exists.	`True`
`manifest_name`	`str`	JSON manifest file name.	`'spatial_analytics_manifest.json'`

Returns:

Type	Description
`DataFrame`	Manifest with one row per written file.

`write_spatial_analytics_exports(dataset_or_legs, output_dir, *, h3_resolution=9, config=None, frequency_group_cols=None, count_dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, count_metrics=DEFAULT_H3_COUNT_METRICS, include_count_matrix=True, time_slices=None, datetime_col='started_at', time_slice_col='time_slice', timezone=None, sample_distance_m=None, formats=('parquet', 'csv'), overwrite=True)` ¶

Write H3 point, frequency and count-matrix tables for dashboards.

Parameters:

Name	Type	Description	Default
`dataset_or_legs`	`MobilityDataset \| DataFrame`	A `MobilityDataset` or a leg table.	required
`output_dir`	`str \| Path`	Destination directory, for example `Data/Output/2-transformed-data/spatial-analytics`.	required
`h3_resolution`	`int \| Iterable[int]`	H3 resolution or list of resolutions used for point indexing.	`9`
`config`	`ProjectConfig \| None`	Optional project configuration. When provided, its timezone and time slices are used unless explicit values are passed.	`None`
`frequency_group_cols`	`Iterable[str] \| None`	Grouping columns for `aggregate_h3_frequencies`. Defaults to H3 cell only.	`None`
`count_dimension_sets`	`Iterable[Iterable[str]]`	Dimension sets used for the wide count matrix.	`DEFAULT_H3_COUNT_DIMENSION_SETS`
`count_metrics`	`Iterable[str]`	Count metrics included in the wide count matrix.	`DEFAULT_H3_COUNT_METRICS`
`include_count_matrix`	`bool`	Whether to export `h3_count_matrix`.	`True`
`time_slices`	`Iterable[TimeSlice] \| None`	Optional reusable daily intervals. Defaults to `config.time_slices` or package defaults.	`None`
`datetime_col`	`str`	Datetime column used to assign time slices.	`'started_at'`
`time_slice_col`	`str`	Name of the time-slice column.	`'time_slice'`
`timezone`	`str \| None`	Local timezone used for time slices. Defaults to `config.timezone` or `Europe/Zurich`.	`None`
`sample_distance_m`	`float \| None`	Optional regular sampling distance in metres before H3 indexing.	`None`
`formats`	`Iterable[str]`	Output formats: `parquet`, `csv`, `pickle`/`pkl`. Defaults to `parquet` and `csv`.	`('parquet', 'csv')`
`overwrite`	`bool`	If false, fail when a target file already exists.	`True`

Returns:

Type	Description
`DataFrame`	Manifest with one row per written file.

`write_duckdb_spatial_database(tables, database_path, *, overwrite=True, load_spatial=True, require_spatial_extension=False)` ¶

Write mobility tables to a local DuckDB database.

GeoDataFrame geometries are stored as WKB columns so that the database can be queried even without the spatial extension. When the extension is available, companion views ending with _spatial are created with a DuckDB geometry column.

Parameters:

Name	Type	Description	Default
`tables`	`Mapping[str, DataFrame] \| MobilityDataset`	Mapping of table names to DataFrames, or a `MobilityDataset`.	required
`database_path`	`str \| Path`	Destination `.duckdb` file.	required
`overwrite`	`bool`	If true, replace an existing database file.	`True`
`load_spatial`	`bool`	Try to install and load DuckDB's spatial extension.	`True`
`require_spatial_extension`	`bool`	If true, fail when the spatial extension cannot be loaded. Keep false for offline or lightweight use.	`False`

Returns:

Type	Description
`DataFrame`	Manifest with table names, row counts and spatial-extension status.

Raises:

Type	Description
`ImportError`	If the optional `duckdb` dependency is missing.
`FileExistsError`	If the database exists and `overwrite` is false.

Indicateurs¶

`xyt_gps.indicators` ¶

Mobility indicator helpers computed from structured mobility tables.

`build_person_day_indicators(dataset, *, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, config=None, default_phase_name='All')` ¶

Build person-day mobility indicators by mode.

The MVP indicators are distance, travel time and number of trips. Values are aggregated per user, date, analytical phase/period and mode. Distances are expressed in kilometers, durations in minutes. If the dataset has no Phase column, all rows are assigned to default_phase_name.

Parameters:

Name	Type	Description	Default
`dataset`	`MobilityDataset`	Transformed mobility dataset.	required
`mode_col`	`str`	Leg mode column, for example `mode_niv1`, `mode_niv2` or `mode_mrmt`.	`'mode_niv1'`
`trips_mode_col`	`str \| None`	Trip mode column. If absent, it is inferred as `main_{mode_col}`.	`None`
`distance_col`	`str \| None`	Optional leg distance column override. By default the function tries `length`, `distance`, then `length_leg`.	`None`
`include_zero_days`	`bool`	When true, build a continuous user-day calendar from `user_stats` phase ranges and fill days without movement with zero.	`True`
`include_excursions`	`bool`	When false, exclude rows flagged with `excursion == 1` in legs and trips before computing indicators.	`True`
`include_airplane`	`bool`	When false, exclude legs and trips whose source or mapped mode columns identify airplane travel. Airplane is excluded by default because it can dominate distances and CO2 indicators.	`False`
`config`	`ProjectConfig \| None`	Optional project configuration. If omitted, phase metadata stored in `MobilityDataset` is used when available.	`None`
`default_phase_name`	`str`	Analytical period name used when no phase split is present in the dataset.	`'All'`

Returns:

Type	Description
`DataFrame`	Long table with columns `user_id`, `date`, `Phase`, `mode`,
`DataFrame`	`distance_km`, `travel_time_min`, `trip_count` and `leg_count`.

Raises:

Type	Description
`KeyError`	If required columns are missing.

`build_person_phase_indicators(person_day, *, user_stats=None, weight_col='weight')` ¶

Average person-day indicators by user, phase and mode.

Parameters:

Name	Type	Description	Default
`person_day`	`DataFrame`	Table produced by `build_person_day_indicators`.	required
`user_stats`	`DataFrame \| None`	Optional user-level table used to attach weights.	`None`
`weight_col`	`str`	User weight column. Missing weights are set to 1.	`'weight'`

Returns:

Type	Description
`DataFrame`	Long table with mean daily distance, travel time and trip count by
`DataFrame`	user, phase and mode. If `user_stats` is provided, the table also
`DataFrame`	contains `weight_col`.

`build_population_indicators(person_phase, *, use_weights=True, weight_col='weight')` ¶

Average person-phase indicators at population level.

Parameters:

Name	Type	Description	Default
`person_phase`	`DataFrame`	Table produced by `build_person_phase_indicators`.	required
`use_weights`	`bool`	When true and `weight_col` is available, compute weighted means.	`True`
`weight_col`	`str`	Weight column attached to `person_phase`.	`'weight'`

Returns:

Type	Description
`DataFrame`	Long table with one row per phase and mode. `n_users` counts the
`DataFrame`	number of users contributing to each phase-mode mean.

`compute_mobility_indicators(dataset, *, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, use_weights=True, weight_col='weight', config=None, default_phase_name='All')` ¶

Compute the first mobility indicator tables from a mobility dataset.

Parameters:

Name	Type	Description	Default
`dataset`	`MobilityDataset`	Transformed mobility dataset.	required
`mode_col`	`str`	Leg mode column used as indicator granularity.	`'mode_niv1'`
`trips_mode_col`	`str \| None`	Optional trip mode column override.	`None`
`distance_col`	`str \| None`	Optional leg distance column override.	`None`
`include_zero_days`	`bool`	Whether to include tracked days without movement as zero rows.	`True`
`include_excursions`	`bool`	Whether to include rows flagged as excursions in the indicator base.	`True`
`include_airplane`	`bool`	Whether to include airplane legs and trips in the indicator base. Defaults to false because airplane rows can dominate distance, CO2 and demand profiles.	`False`
`use_weights`	`bool`	Whether to use `weight_col` from `dataset.user_stats` for population-level means.	`True`
`weight_col`	`str`	User-level weight column. Missing weights are set to 1.	`'weight'`
`config`	`ProjectConfig \| None`	Optional project configuration. If omitted, phase metadata stored in `MobilityDataset` is used when available.	`None`
`default_phase_name`	`str`	Analytical period name used when no phase split is present in the dataset.	`'All'`

Returns:

Type	Description
`IndicatorResult`	`IndicatorResult` with `person_day`, `person_phase` and
`IndicatorResult`	`population` tables.

`population_indicator_summary(indicators)` ¶

Return a compact population-level indicator summary.

Parameters:

Name	Type	Description	Default
`indicators`	`IndicatorResult \| DataFrame`	`IndicatorResult` or a population table.	required

Returns:

Type	Description
`DataFrame`	Population indicator table sorted by phase and mode.

Enrichissements¶

`xyt_gps.enrichment` ¶

Optional enrichment helpers for mobility analysis tables.

`CO2OccupancyConfig` `dataclass` ¶

Parameters used to derive CO2 and occupancy metrics from legs.

Factors are expressed in grams per kilometer. co2_g and co2_direct_g are then computed as leg-level totals. Occupancy is recomputed by default; set prefer_observed_occupancy=True to use a positive provider value from occupancy_col when available.

`HealthConfig` `dataclass` ¶

Parameters used to derive simple physical-activity metrics.

`add_co2_occupancy_metrics(legs, *, journeys=None, map_track_trip_journey=None, config=None, mode_col='mode', distance_col=None, journey_purpose_col='main_purpose_mrmt', prefer_observed_occupancy=None, occupancy_col=None)` ¶

Add occupancy and CO2 metrics to a leg table.

The function is intentionally row-level: it does not aggregate. This keeps assumptions visible before mobility indicators are computed.

By default, occupancy is recomputed from distance and purpose because provider columns are often sparse. Pass prefer_observed_occupancy=True to use a positive value from occupancy_col when available and fall back to the computed value otherwise.

`add_health_metrics(legs, *, config=None, mode_col='mode_niv1', distance_col=None, duration_col='duration')` ¶

Add simple activity, intensity, MET and calorie metrics to legs.

`build_leg_enrichment_tables(legs)` ¶

Return compact CO2 and health side tables keyed by leg_id.

Mobility motifs¶

`xyt_gps.motifs` ¶

Daily mobility motif helpers.

The functions in this module migrate the useful core of the historical GPStoGraph workflow without exposing notebook-era graph objects as the main API. A motif is represented as a daily directed transition structure between visited places. Places can come from an existing location_id column, or be derived from staypoint purpose and rounded coordinates.

`assign_mobility_motif_ids(motifs, *, signature_col='motif_signature', id_col='motif_id', top_n=9, other_motif_id=99)` ¶

Assign stable numeric ids to the most frequent motif signatures.

Parameters:

Name	Type	Description	Default
`motifs`	`DataFrame`	Motif table returned by `build_mobility_motifs()`.	required
`signature_col`	`str`	Column containing the canonical motif signature.	`'motif_signature'`
`id_col`	`str`	Name of the generated id column.	`'motif_id'`
`top_n`	`int`	Number of frequent motifs receiving ids from 1 to `top_n`.	`9`
`other_motif_id`	`int`	Id used for less frequent motifs.	`99`

Returns:

Type	Description
`DataFrame`	Copy of `motifs` with a numeric `id_col`.

Raises:

Type	Description
`KeyError`	If `signature_col` is absent.

`build_mobility_motifs(staypoints_or_dataset, *, user_col='user_id', started_at_col='started_at', finished_at_col='finished_at', date_col=None, location_col=None, purpose_col='purpose_niv1', lon_col='lon', lat_col='lat', coordinate_precision=4, top_n_motifs=9, other_motif_id=99)` ¶

Build daily mobility motifs from staypoints.

The function works on dataset.staypoints or on a staypoint DataFrame. A motif is the daily sequence of places visited by one user after removing consecutive duplicate places. The sequence is relabelled in order of first appearance, then encoded as a flattened directed adjacency matrix. This keeps the old motif_flat idea while making the result easy to export and compare.

Parameters:

Name	Type	Description	Default
`staypoints_or_dataset`	`MobilityDataset \| DataFrame`	`MobilityDataset` or staypoint table.	required
`user_col`	`str`	User identifier column.	`'user_id'`
`started_at_col`	`str`	Start timestamp column used for sorting.	`'started_at'`
`finished_at_col`	`str`	Optional end timestamp column kept in motif nodes.	`'finished_at'`
`date_col`	`str \| None`	Optional explicit date column. If omitted, the date is derived from `started_at_col`.	`None`
`location_col`	`str \| None`	Optional stable place identifier. If omitted, a place key is derived from purpose and rounded coordinates.	`None`
`purpose_col`	`str \| None`	Optional activity/purpose label used in derived place keys.	`'purpose_niv1'`
`lon_col`	`str`	Longitude column used when deriving place keys.	`'lon'`
`lat_col`	`str`	Latitude column used when deriving place keys.	`'lat'`
`coordinate_precision`	`int`	Decimal precision for coordinate-derived keys.	`4`
`top_n_motifs`	`int`	Number of frequent motif signatures assigned ids 1..N.	`9`
`other_motif_id`	`int`	Id assigned to less frequent motifs.	`99`

Returns:

Type	Description
`DataFrame`	One row per user-day with the canonical sequence, adjacency signature,
`DataFrame`	motif id and simple counts.

Raises:

Type	Description
`KeyError`	If required columns are missing.
`ValueError`	If no usable staypoint row remains.

`summarize_mobility_motifs(motifs, *, motif_id_col='motif_id', signature_col='motif_signature')` ¶

Summarize motif frequencies and simple structural properties.

`build_mobility_motif_sequences(motifs, *, user_col='user_id', date_col='date', motif_id_col='motif_id', n_days=60, align_to_week=True, fill_value=0)` ¶

Build a fixed-width daily motif sequence for each user.

This is the package equivalent of the historical motif_sequence() helper. Each row is a user and each column is a relative day. Missing days are filled with fill_value. When align_to_week=True, the first observed motif is shifted so the first column corresponds to Monday.

Export¶

`xyt_gps.export` ¶

Export helpers for structured mobility tables.

The export layer is kept separate from transformation code so formats such as CSV, Parquet or Excel remain optional package concerns.

`mobility_dataset_tables(dataset)` ¶

Return the named tables contained in a MobilityDataset.

Parameters:

Name	Type	Description	Default
`dataset`	`MobilityDataset`	Transformed mobility dataset.	required

Returns:

Type	Description
`dict[str, DataFrame]`	Dictionary keyed by stable table names. The order follows the
`dict[str, DataFrame]`	transformation workflow and is reused by export helpers.

`write_mobility_dataset(dataset, output_dir, *, formats=('csv', 'geojson'), extra_tables=None, include_validation=True, include_quality_reports=True, selection_table=None, overwrite=True)` ¶

Write intermediate MobilityDataset tables to disk.

The function writes the inspectable states of the preparation workflow: storyline, legs, staypoints, trips, journeys, user stats, mapping tables and optional validation and quality reports. It returns a manifest so the caller can see exactly what was exported.

Parameters:

Name	Type	Description	Default
`dataset`	`MobilityDataset`	Transformed mobility dataset.	required
`output_dir`	`str \| Path`	Destination directory.	required
`formats`	`Iterable[str]`	Export formats. Supported values are `csv`, `geojson`, `parquet`, `pickle`/`pkl` and `xlsx`. `geojson` is written only for geospatial tables.	`('csv', 'geojson')`
`extra_tables`	`Mapping[str, DataFrame] \| None`	Optional additional tables to export with the same formats, for example `user_presence` or `participation_grid`. Table names must be stable file-safe identifiers.	`None`
`include_validation`	`bool`	Whether to export raw-schema validation issues.	`True`
`include_quality_reports`	`bool`	Whether to export tracking-quality summary and optional user selection table.	`True`
`selection_table`	`DataFrame \| None`	Optional table produced by `build_user_selection_table`.	`None`
`overwrite`	`bool`	If false, fail when a target file already exists.	`True`

Returns:

Type	Description
`DataFrame`	A manifest with one row per written file: table, format, path and
`DataFrame`	number of rows.

Raises:

Type	Description
`FileExistsError`	If `overwrite` is false and a file already exists.
`ValueError`	If an unsupported format is requested.
`ImportError`	If optional dependencies for Parquet or Excel are missing.

`export_mobility_tables(*args, **kwargs)` ¶

Alias for write_mobility_dataset.

write_mobility_dataset is the preferred name because it says that the full structured dataset is written, including reports and mapping tables.

`write_indicator_result(indicators, output_dir, *, formats=('parquet', 'csv'), overwrite=True)` ¶

Write mobility indicator tables to disk.

Parameters:

Name	Type	Description	Default
`indicators`	`IndicatorResult`	Result returned by `compute_mobility_indicators`.	required
`output_dir`	`str \| Path`	Destination directory.	required
`formats`	`Iterable[str]`	Export formats. Supported values are `csv`, `parquet` and `pickle`/`pkl`.	`('parquet', 'csv')`
`overwrite`	`bool`	If false, fail when a target file already exists.	`True`

Returns:

Type	Description
`DataFrame`	Manifest with one row per written file.

Visualisation cartes¶

`xyt_gps.viz_maps` ¶

Interactive map visualizations for GPS traces and H3 cells.

`plot_h3_frequency_map(h3_frequency, *, h3_col='h3_cell', value_col='point_count', tooltip_cols=None, aggregate_cells=True, max_cells=2500, palette=('#f4f1f8', '#d9cce9', '#b79bd5', '#8f67bd', '#6b4c9a', '#3d275f'), fill_opacity=0.62, line_opacity=0.25, tiles='cartodbpositron', zoom_start=11, map_center=None, save_path=None)` ¶

Plot H3 fréquentation cells on an interactive Folium map.

Parameters:

Name	Type	Description	Default
`h3_frequency`	`DataFrame`	Table produced by `aggregate_h3_frequencies()` or any table containing an H3 cell column and a numeric count.	required
`h3_col`	`str`	Column containing H3 cell identifiers.	`'h3_cell'`
`value_col`	`str`	Numeric column used to color cells.	`'point_count'`
`tooltip_cols`	`Iterable[str] \| None`	Columns shown in the popup. When omitted, common count columns are used when present.	`None`
`aggregate_cells`	`bool`	If true, aggregate rows by H3 cell before plotting. This is useful when the input is split by mode, phase or project.	`True`
`max_cells`	`int \| None`	Optional maximum number of cells to draw. The most frequent cells are kept. Set to `None` to draw all cells, which can be slow.	`2500`
`palette`	`tuple[str, ...]`	Sequential color palette from low to high values.	`('#f4f1f8', '#d9cce9', '#b79bd5', '#8f67bd', '#6b4c9a', '#3d275f')`
`fill_opacity`	`float`	Polygon fill opacity.	`0.62`
`line_opacity`	`float`	Polygon border opacity.	`0.25`
`tiles`	`str`	Folium base map.	`'cartodbpositron'`
`zoom_start`	`int`	Initial zoom level.	`11`
`map_center`	`tuple[float, float] \| None`	Optional `(lat, lon)` map center. If omitted, it is estimated from the H3 cells.	`None`
`save_path`	`str \| Path \| None`	Optional HTML output path.	`None`

Returns:

Type	Description
`object`	A `folium.Map` object.

Raises:

Type	Description
`ImportError`	If `folium` or `h3` is not installed.
`KeyError`	If required columns are missing.
`ValueError`	If no valid H3 cell can be plotted.

`plot_gps_traces(dataset_or_legs, *, staypoints=None, user_ids=None, sample_n=None, random_state=42, color_by='mode_niv1', geometry_col='geometry', show_staypoints=True, use_antpath=True, tiles='cartodbpositron', zoom_start=12, map_center=None, save_path=None)` ¶

Plot GPS legs and optional staypoints on an interactive Folium map.

The function is designed for notebook checks. It keeps the API small: pass a MobilityDataset after transformation, or pass a leg GeoDataFrame directly.

Parameters:

Name	Type	Description	Default
`dataset_or_legs`	`MobilityDataset \| GeoDataFrame`	`MobilityDataset` or leg GeoDataFrame.	required
`staypoints`	`GeoDataFrame \| None`	Optional staypoint GeoDataFrame when passing legs directly.	`None`
`user_ids`	`Iterable \| None`	Optional users to display.	`None`
`sample_n`	`int \| None`	Optional number of legs to sample before plotting.	`None`
`random_state`	`int \| None`	Random seed used for leg sampling.	`42`
`color_by`	`str \| None`	Column used to color legs. Set to `None` for one color.	`'mode_niv1'`
`geometry_col`	`str`	Geometry column name.	`'geometry'`
`show_staypoints`	`bool`	Whether to draw staypoints as circles.	`True`
`use_antpath`	`bool`	Whether to animate legs with Folium `AntPath`.	`True`
`tiles`	`str`	Folium base map.	`'cartodbpositron'`
`zoom_start`	`int`	Initial zoom level.	`12`
`map_center`	`tuple[float, float] \| None`	Optional `(lat, lon)` center. If absent, the center is computed from plotted geometries.	`None`
`save_path`	`str \| Path \| None`	Optional HTML output path.	`None`

Returns:

Type	Description
`object`	A `folium.Map` object.

Raises:

Type	Description
`ImportError`	If `folium` is not installed.
`KeyError`	If the geometry column is missing.
`ValueError`	If no leg geometry is available for plotting.

Visualisation indicateurs¶

`xyt_gps.viz_indicators` ¶

Notebook-friendly indicator charts.

`plot_indicator_bars(indicators, *, table='population', metrics=None, group_col='mode', facet_col='Phase', title='Indicateurs de mobilité', max_bars=None, group_order=None, sort_bars_by_value=False, value_format='{:.1f}', bar_color='#6b4c9a', include_all_modes=True, all_modes_label='Tous modes', all_modes_color=ALL_MODES_COLOR, show_demand_profile=True, demand_profile_max_modes=6, metadata=None, show_identity_card=True, save_path=None)` ¶

Render mobility indicators as simple notebook bar charts.

The default input is an IndicatorResult returned by compute_mobility_indicators(). The function reads its population table and plots the main per-day metrics by mode. It can also receive a DataFrame directly, for example indicators.person_phase.

Parameters:

Name	Type	Description	Default
`indicators`	`IndicatorResult \| DataFrame`	`IndicatorResult` or an indicator DataFrame.	required
`table`	`str`	Indicator table to use when `indicators` is an `IndicatorResult`: `person_day`, `person_phase` or `population`.	`'population'`
`metrics`	`Iterable[str] \| None`	Numeric columns to plot. When omitted, common indicator columns are inferred.	`None`
`group_col`	`str`	Categorical column used for bars, usually `mode`.	`'mode'`
`facet_col`	`str \| None`	Optional column used to create one panel per period/phase.	`'Phase'`
`title`	`str`	Displayed title.	`'Indicateurs de mobilité'`
`max_bars`	`int \| None`	Optional maximum number of bars per panel.	`None`
`group_order`	`Iterable[str] \| None`	Optional explicit order for the categorical bars. When omitted, common mobility modes use a stable default order so that phases remain visually comparable.	`None`
`sort_bars_by_value`	`bool`	When true, sort bars by decreasing value inside each panel. This reproduces the older compact ranking behavior, but can make modes change position between phases.	`False`
`value_format`	`str`	Python format string used for numeric labels.	`'{:.1f}'`
`bar_color`	`str`	CSS color for bars.	`'#6b4c9a'`
`include_all_modes`	`bool`	Whether to add a total row across all displayed modes in each phase/panel.	`True`
`all_modes_label`	`str`	Label used for the total row.	`'Tous modes'`
`all_modes_color`	`str`	CSS color used for the total row.	`ALL_MODES_COLOR`
`show_demand_profile`	`bool`	Whether to display 5-minute daily demand curves when `IndicatorResult.metadata` contains them.	`True`
`demand_profile_max_modes`	`int`	Maximum number of demand curves per phase, including the all-modes curve.	`6`
`metadata`	`Mapping[str, object] \| None`	Optional metadata displayed in the identity card. Values override `IndicatorResult.metadata` when the input is an `IndicatorResult`.	`None`
`show_identity_card`	`bool`	Whether to display calculation metadata before the bars.	`True`
`save_path`	`str \| Path \| None`	Optional HTML file path.	`None`

Returns:

Type	Description
`object`	An `IPython.display.HTML` object when IPython is available, otherwise
`object`	the raw HTML string.

Raises:

Type	Description
`KeyError`	If required columns are missing.
`ValueError`	If no metric can be plotted.

Visualisation participation¶

`xyt_gps.viz_participation` ¶

Notebook-friendly participation heatmaps.

`plot_participation_heatmap(participation_grid, *, user_col='user_id', week_col='protocol_week_number', score_col='active_days_count', phase_col='Phase', title='Participation hebdomadaire', max_score=7, max_users=None, cell_size=13, cell_gap=3, show_phase_separators=True, phase_separator_color='#e83b46', phase_separator_width=3, save_path=None)` ¶

Render a GitHub-style participation heatmap in a notebook.

The input is the long table produced by build_weekly_participation_grid. Rows are users, columns are protocol weeks, and cell color intensity is based on score_col, usually the number of active tracked days from 0 to 7.

Parameters:

Name	Type	Description	Default
`participation_grid`	`DataFrame`	Weekly participation table.	required
`user_col`	`str`	User identifier column.	`'user_id'`
`week_col`	`str`	Week number column.	`'protocol_week_number'`
`score_col`	`str`	Participation score column.	`'active_days_count'`
`phase_col`	`str`	Optional phase/period column used in cell tooltips.	`'Phase'`
`title`	`str`	Displayed title.	`'Participation hebdomadaire'`
`max_score`	`int`	Maximum score used for the color scale.	`7`
`max_users`	`int \| None`	Optional maximum number of users displayed.	`None`
`cell_size`	`int`	Square size in pixels.	`13`
`cell_gap`	`int`	Gap between squares in pixels.	`3`
`show_phase_separators`	`bool`	Whether to add a red separator when the phase changes between two consecutive weeks.	`True`
`phase_separator_color`	`str`	CSS color used for phase separators.	`'#e83b46'`
`phase_separator_width`	`int`	Separator column width in pixels.	`3`
`save_path`	`str \| Path \| None`	Optional HTML file path.	`None`

Returns:

Type	Description
`object`	An `IPython.display.HTML` object when IPython is available, otherwise
`object`	the raw HTML string.

Raises:

Type	Description
`KeyError`	If required columns are missing.
`ValueError`	If the participation grid is empty.

Mappings¶

`xyt_gps.mappings` ¶

Mode and purpose mapping configuration.

`MobilityMappings` `dataclass` ¶

Project-level mode and purpose mappings used by transformations.

`mode_purpose_mapping(**kwargs)` ¶

Build mode and purpose mappings for a project.

This is a readable factory around MobilityMappings. Pass the same keyword arguments as the dataclass when a project needs custom groupings.

Example

mode_purpose_mapping(storyline_mode_niv1={"Marche": ("Mode::Walk",)})

`map_value(value, mapping, *, default='Autres')` ¶

Map one provider value to a project category.

`map_sequence(value, mapping, *, default='Autres')` ¶

Map a provider sequence such as Mode::Walk + Mode::Bus category by category.

Référence API complète¶

Pipeline recommandé¶

xyt_gps.pipeline ¶

MobilityPipelineResult dataclass ¶

__iter__() ¶

Configuration¶

xyt_gps.config ¶

Phase dataclass ¶

TimeSlice dataclass ¶

TrackingThresholds dataclass ¶

SignalLossThreshold dataclass ¶

SpatialQualityThresholds dataclass ¶

MatchingThresholds dataclass ¶

ProjectConfig dataclass ¶

default_time_slices() ¶

default_signal_loss_thresholds() ¶

Import¶

xyt_gps.io ¶

GpsExportPaths dataclass ¶

RawSampleConfig dataclass ¶

by_users(n, *, random_state=42, chunksize=100000, user_id_column='user_id') classmethod ¶

random_rows(n, *, random_state=42, chunksize=100000, user_id_column='user_id') classmethod ¶

infer_gps_export_paths(config, *, must_exist=True) ¶

source_id_for_config(config) ¶

sample_raw_gps_data(raw, sample) ¶

load_gps_export(config, *, sample=None, validate=True, must_exist=True) ¶

load_gps_source(config, *, source_id=None, sample=None, validate=True, must_exist=True, namespace_ids=True) ¶

concat_raw_gps_data(raws, *, validate=True) ¶

load_gps_sources(configs, *, sample=None, validate=True, must_exist=True, namespace_ids=True) ¶

load_sociodemo(path, *, user_id_column='Id') ¶

Données testset¶

xyt_gps.sample_data ¶

find_sample_gps_path(start=None) ¶

load_sample_gps(path=None, *, user_id='sample_user', max_rows=None, validate=True) ¶

Données synthétiques¶

xyt_gps.synthetic ¶

SyntheticExperiment dataclass ¶

SyntheticAnomalyRates dataclass ¶

SyntheticGpsDataset dataclass ¶

tables() ¶

default_declic_synthetic_experiments() ¶

generate_synthetic_declic_gps(*, sample_path=None, experiments=None, users_per_experiment=50, random_state=42, anomaly_rates=None, validate=True) ¶

write_synthetic_gps_dataset(dataset, output_dir, *, formats=('parquet',), overwrite=True) ¶

Schémas¶

xyt_gps.schema ¶

SchemaSpec dataclass ¶

expected_gps_schema() ¶

check_raw_import_columns(storyline, user_statistics=None, *, trips=None, journeys=None, include_recommended=True, raise_on_error=False) ¶

validate_schema(df, spec, *, allow_extra_columns=True) ¶

validate_gps_raw(raw) ¶

Transformations¶

xyt_gps.transform ¶

prepare_mobility_dataset(raw, config, *, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validation=None) ¶

concat_mobility_datasets(datasets) ¶

Parsing¶

xyt_gps.parsing ¶

drop_nans_if_low_rate(df, column, *, threshold=0.01) ¶

parse_ewkb(value) ¶

parse_date_columns(df, columns, *, utc=True) ¶

assign_phase(value, phases, *, default='Other') ¶

Préparation des tables¶

xyt_gps.prepare_tables ¶

apply_storyline_mappings(storyline, mappings=None) ¶

apply_trip_journey_mappings(trips, journeys, mappings=None) ¶

prepare_storyline(storyline, config, *, drop_nan_threshold=0.01) ¶

prepare_trips(trips, config) ¶

prepare_journeys(journeys, config) ¶

Tables de mobilité¶

xyt_gps.mobility_tables ¶

split_storyline(storyline) ¶

add_user_id_day(legs) ¶

add_length_quantile_flags(legs, *, group_col='mode', length_col='length', quantiles=(0.98, 0.99)) ¶

Relations entre tables¶

xyt_gps.relations ¶

build_track_trip_journey_map(legs, trips, journeys, *, tolerance='5s') ¶

build_legs_staypoints_map(legs, staypoints, *, tolerance='5s') ¶

add_journey_to_trips(trips, journeys, mapping) ¶

add_trip_destination_activity(trips, map_track_trip_journey, map_legs_staypoints) ¶

add_excursion_flags_to_trips_journeys(trips, journeys, legs, map_track_trip_journey, *, excursion_col='excursion') ¶

Statistiques utilisateurs¶

`xyt_gps.pipeline` ¶

`MobilityPipelineResult` `dataclass` ¶

`iter()` ¶

`xyt_gps.config` ¶

`Phase` `dataclass` ¶

`TimeSlice` `dataclass` ¶

`TrackingThresholds` `dataclass` ¶

`SignalLossThreshold` `dataclass` ¶

`SpatialQualityThresholds` `dataclass` ¶

`MatchingThresholds` `dataclass` ¶

`ProjectConfig` `dataclass` ¶

`default_time_slices()` ¶

`default_signal_loss_thresholds()` ¶

`xyt_gps.io` ¶

`GpsExportPaths` `dataclass` ¶

`RawSampleConfig` `dataclass` ¶

`by_users(n, *, random_state=42, chunksize=100000, user_id_column='user_id')` `classmethod` ¶

`random_rows(n, *, random_state=42, chunksize=100000, user_id_column='user_id')` `classmethod` ¶

`infer_gps_export_paths(config, *, must_exist=True)` ¶

`source_id_for_config(config)` ¶

`sample_raw_gps_data(raw, sample)` ¶

`load_gps_export(config, *, sample=None, validate=True, must_exist=True)` ¶

`load_gps_source(config, *, source_id=None, sample=None, validate=True, must_exist=True, namespace_ids=True)` ¶

`concat_raw_gps_data(raws, *, validate=True)` ¶

`load_gps_sources(configs, *, sample=None, validate=True, must_exist=True, namespace_ids=True)` ¶

`load_sociodemo(path, *, user_id_column='Id')` ¶

`xyt_gps.sample_data` ¶

`find_sample_gps_path(start=None)` ¶

`load_sample_gps(path=None, *, user_id='sample_user', max_rows=None, validate=True)` ¶

`xyt_gps.synthetic` ¶

`SyntheticExperiment` `dataclass` ¶

`SyntheticAnomalyRates` `dataclass` ¶

`SyntheticGpsDataset` `dataclass` ¶

`tables()` ¶

`default_declic_synthetic_experiments()` ¶

`generate_synthetic_declic_gps(*, sample_path=None, experiments=None, users_per_experiment=50, random_state=42, anomaly_rates=None, validate=True)` ¶

`write_synthetic_gps_dataset(dataset, output_dir, *, formats=('parquet',), overwrite=True)` ¶

`xyt_gps.schema` ¶

`SchemaSpec` `dataclass` ¶

`expected_gps_schema()` ¶

`check_raw_import_columns(storyline, user_statistics=None, *, trips=None, journeys=None, include_recommended=True, raise_on_error=False)` ¶

`validate_schema(df, spec, *, allow_extra_columns=True)` ¶

`validate_gps_raw(raw)` ¶

`xyt_gps.transform` ¶

`prepare_mobility_dataset(raw, config, *, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validation=None)` ¶

`concat_mobility_datasets(datasets)` ¶

`xyt_gps.parsing` ¶

`drop_nans_if_low_rate(df, column, *, threshold=0.01)` ¶

`parse_ewkb(value)` ¶

`parse_date_columns(df, columns, *, utc=True)` ¶

`assign_phase(value, phases, *, default='Other')` ¶

`xyt_gps.prepare_tables` ¶

`apply_storyline_mappings(storyline, mappings=None)` ¶

`apply_trip_journey_mappings(trips, journeys, mappings=None)` ¶

`prepare_storyline(storyline, config, *, drop_nan_threshold=0.01)` ¶

`prepare_trips(trips, config)` ¶

`prepare_journeys(journeys, config)` ¶

`xyt_gps.mobility_tables` ¶

`split_storyline(storyline)` ¶

`add_user_id_day(legs)` ¶

`add_length_quantile_flags(legs, *, group_col='mode', length_col='length', quantiles=(0.98, 0.99))` ¶

`xyt_gps.relations` ¶

`build_track_trip_journey_map(legs, trips, journeys, *, tolerance='5s')` ¶

`build_legs_staypoints_map(legs, staypoints, *, tolerance='5s')` ¶

`add_journey_to_trips(trips, journeys, mapping)` ¶

`add_trip_destination_activity(trips, map_track_trip_journey, map_legs_staypoints)` ¶

`add_excursion_flags_to_trips_journeys(trips, journeys, legs, map_track_trip_journey, *, excursion_col='excursion')` ¶

`xyt_gps.user_stats` ¶

`add_excursion_stats_to_user_stats(user_stats, legs, *, excursion_col='excursion')` ¶

`build_user_stats(storyline, config, *, user_statistics=None, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0)` ¶

`xyt_gps.quality` ¶

`build_daily_tracking_presence(storyline, config=None, *, user_id_column='user_id', date_column='started_at', exclude_resampled=True)` ¶

`build_weekly_participation_grid(storyline, config, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')` ¶

`calculate_user_tracking_stats(storyline)` ¶

`summarize_participation_grid(participation_grid, *, good_week_min_days=5)` ¶

`build_tracking_gap_report(storyline, config=None, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')` ¶

`build_tracking_quality_report(user_stats)` ¶

`calculate_tracking_periods(user_stats, config, *, min_days_by_phase=None)` ¶

`flag_tracking_quality(user_stats, config)` ¶

`summarize_phase_tracking(user_stats, config, *, day_thresholds=(7, 14, 21))` ¶