Référence API complète¶
Cette page liste les modules documentés par docstrings. Pour apprendre le package ou choisir les fonctions principales, commencer par API recommandée.
Les fonctions plus fines servent surtout aux notebooks de production, aux contrôles table par table et au débogage.
Pipeline recommandé¶
xyt_gps.pipeline
¶
Recommended end-to-end entry point for the mobility pipeline.
MobilityPipelineResult
dataclass
¶
Named result returned by run_mobility_pipeline.
Attributes:
| Name | Type | Description |
|---|---|---|
raw |
RawGpsData
|
Raw GPS tables after loading or validation. |
dataset |
MobilityDataset
|
Structured mobility tables. |
indicators |
IndicatorResult | None
|
Optional mobility indicators. |
__iter__()
¶
Allow raw, dataset, indicators = run_mobility_pipeline(...).
run_mobility_pipeline(config, *, raw=None, sample=None, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, validate=True, must_exist=True, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, compute_indicators=True, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, use_weights=True, default_phase_name='All')
¶
Run the recommended single-source GPS-to-indicators workflow.
This is the simplest entry point for analysts discovering the package:
load raw GPS tables, prepare mobility tables and optionally compute the
generic indicators. Pass raw when tables have already been loaded in a
notebook; otherwise the function calls load_gps_export(config).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ProjectConfig
|
Project configuration. |
required |
raw
|
RawGpsData | None
|
Optional preloaded raw GPS tables. When omitted, files are loaded
from |
None
|
sample
|
RawSampleConfig | None
|
Optional sampling strategy used only when |
None
|
sociodemo
|
DataFrame | None
|
Optional user-level sociodemographic table. |
None
|
weights
|
DataFrame | None
|
Optional user-level weighting table. |
None
|
weight_col
|
str
|
Name of the user weight column. |
'weight'
|
default_weight
|
float
|
Default user weight when no weight is available. |
1.0
|
validate
|
bool
|
Whether to validate raw tables during loading. |
True
|
must_exist
|
bool
|
Whether expected files must exist when loading from disk. |
True
|
resample_missing_days
|
bool
|
Add explicit |
False
|
clean_leg_geometries
|
bool
|
Normalize continuous leg geometries. |
True
|
add_length_outlier_flags
|
bool
|
Add per-mode length outlier flags. |
True
|
add_signal_quality_flags
|
bool
|
Compute GPS signal-loss quality columns. |
True
|
compute_indicators
|
bool
|
Whether to compute |
True
|
mode_col
|
str
|
Leg mode column used for indicators. |
'mode_niv1'
|
trips_mode_col
|
str | None
|
Optional trip mode column override. |
None
|
distance_col
|
str | None
|
Optional leg distance column override. |
None
|
include_zero_days
|
bool
|
Include tracked days without movement in daily means. |
True
|
include_excursions
|
bool
|
Include legs/trips flagged as excursions. |
True
|
include_airplane
|
bool
|
Include airplane legs/trips in indicators. Defaults to false because airplane rows can dominate distances and CO2. |
False
|
use_weights
|
bool
|
Use user weights in population indicators. |
True
|
default_phase_name
|
str
|
Period label when no phase split exists. |
'All'
|
Returns:
| Type | Description |
|---|---|
MobilityPipelineResult
|
|
MobilityPipelineResult
|
optional indicators. |
Configuration¶
xyt_gps.config
¶
Configuration objects for project-specific GPS transformations.
Phase
dataclass
¶
Named experimental phase used to tag and filter mobility records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Stable phase name used in output columns and reports. |
required |
start
|
str | Timestamp
|
Inclusive phase start date. |
required |
end
|
str | Timestamp
|
Inclusive phase end date. |
required |
TimeSlice
dataclass
¶
Named daily time interval used for reusable temporal aggregation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Stable output label, for example |
required |
start
|
str
|
Inclusive start time formatted as |
required |
end
|
str
|
Exclusive end time formatted as |
required |
TrackingThresholds
dataclass
¶
Minimum tracking duration expected for analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_days_by_phase
|
Mapping[str, int]
|
Minimum number of tracked days required by phase name. Missing phases default to one day in the filtering helpers. |
dict()
|
min_total_tracked_days
|
int
|
Minimum active tracking days across the full observation period. |
7
|
round_to_full_weeks
|
bool
|
If true, phase durations are rounded down to complete weeks after applying the minimum-day threshold. |
True
|
SignalLossThreshold
dataclass
¶
Mode-specific signal-loss thresholds.
max_gap_m is the absolute largest distance allowed between two
consecutive points of a leg geometry, in meters. max_relative_gap
is the same gap divided by total leg length.
SpatialQualityThresholds
dataclass
¶
Spatial quality thresholds that may vary by project.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_consecutive_point_distance_m
|
float | None
|
Reserved threshold for future point-jump filters. |
None
|
max_relative_signal_loss
|
float | None
|
Reserved global threshold for future simple signal-loss filters. |
None
|
outlier_quantiles_by_mode
|
tuple[float, ...]
|
Quantiles used to flag unusually long legs within each mode. |
(0.98, 0.99)
|
signal_loss_thresholds_by_level
|
Mapping[int, Mapping[str, SignalLossThreshold]]
|
Mode-specific thresholds used to
create |
default_signal_loss_thresholds()
|
bad_signal_user_quantile
|
float
|
Quantile used to identify users with very high average signal loss. |
0.995
|
signal_loss_mode_column
|
str
|
Column used to match mode-specific signal thresholds. |
'mode'
|
MatchingThresholds
dataclass
¶
Temporal matching and future map-matching parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
leg_trip_journey_tolerance
|
str
|
Maximum temporal tolerance used when matching legs, trips and journeys. |
'5s'
|
osrm_max_points_per_chunk
|
int
|
Reserved chunk size for future OSRM requests. |
99
|
google_directions_fallback
|
bool
|
Reserved flag for a future Google Directions fallback. |
False
|
ProjectConfig
dataclass
¶
Project parameters that should remain explicit in the workflow.
ProjectConfig centralizes import paths, project names, periods,
coordinate systems, phases, thresholds and mappings. The goal is to avoid
hiding methodological choices inside large transformation functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_name
|
str | None
|
Optional analytical experiment name, for example
|
None
|
motiontag_project_name
|
str | None
|
Optional provider project name used in export file names. Required only when loading files by inferred provider names. |
None
|
period
|
str | None
|
Optional export period string used in provider file names. Required only when loading files by inferred provider names. |
None
|
raw_data_dir
|
str | Path
|
Directory containing raw GPS CSV exports. |
'.'
|
export_dir
|
str | Path | None
|
Optional output directory. |
None
|
target_crs
|
str
|
CRS attached to parsed GPS geometries. |
'EPSG:4326'
|
operations_crs
|
str
|
Metric CRS used for distance calculations. |
'EPSG:2056'
|
csv_sep
|
str
|
CSV separator used by GPS exports. |
';'
|
timezone
|
str
|
Local timezone label. |
'Europe/Zurich'
|
phases
|
tuple[Phase, ...]
|
Optional analytical phases used for phase assignment and tracking filters. Leave empty for analyses without phase split. |
()
|
tracking_thresholds
|
TrackingThresholds
|
User-level tracking-quality thresholds. |
TrackingThresholds()
|
spatial_quality_thresholds
|
SpatialQualityThresholds
|
Leg and user GPS-quality thresholds. |
SpatialQualityThresholds()
|
matching_thresholds
|
MatchingThresholds
|
Temporal matching thresholds. |
MatchingThresholds()
|
mappings
|
MobilityMappings
|
Mode and purpose mappings. |
MobilityMappings()
|
time_slices
|
tuple[TimeSlice, ...]
|
Daily time slices used for reusable temporal
aggregations. Values outside the configured intervals are labelled
|
default_time_slices()
|
reference_year
|
int | None
|
Optional year used to derive age from sociodemographic data. |
None
|
storyline_prefix_candidates
|
tuple[str, ...]
|
Accepted storyline file prefixes. |
('StorylineWithTripId', 'StorylineWithUserAnnotations')
|
default_time_slices()
¶
Return default mobility-dashboard time slices.
HC is intentionally not defined as an interval: it is the fallback label
for observations outside the configured peak periods.
default_signal_loss_thresholds()
¶
Return the default signal-loss thresholds.
These values reproduce the current spatial-quality convention. Some absolute thresholds are intentionally very low in that notebook, making the relative signal-loss threshold the effective criterion. They should remain configurable by project.
Import¶
xyt_gps.io
¶
Input helpers for structured GPS exports and project side tables.
GpsExportPaths
dataclass
¶
Resolved file paths for one structured GPS export period.
Attributes:
| Name | Type | Description |
|---|---|---|
storyline |
Path
|
Path to the storyline CSV. |
trips |
Path
|
Path to the trips CSV. |
journeys |
Path
|
Path to the journeys CSV. |
user_statistics |
Path | None
|
Optional path to the user statistics CSV. |
RawSampleConfig
dataclass
¶
Sampling options for loading large raw GPS exports.
Use RawSampleConfig.by_users(n) when checking a large export: it keeps
all rows for a small number of selected users and preserves the relation
between storyline, trips, journeys and user statistics. Use
RawSampleConfig.random_rows(n) only for quick schema inspection.
Attributes:
| Name | Type | Description |
|---|---|---|
mode |
str
|
Sampling strategy, either |
n |
int
|
Number of users or rows to keep. |
random_state |
int | None
|
Optional random seed. |
chunksize |
int
|
CSV chunk size for user-based sampling. |
user_id_column |
str
|
User identifier column in storyline, trips and journeys. |
user_statistics_id_column |
str
|
User identifier column in user statistics. |
storyline_type_column |
str
|
Column used to identify track rows. |
track_type_value |
str
|
Value used for track rows in storyline. |
infer_gps_export_paths(config, *, must_exist=True)
¶
Infer expected raw file paths from project parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ProjectConfig
|
Project configuration containing |
required |
must_exist
|
bool
|
When true, raise an error if any expected file is absent. |
True
|
Returns:
| Type | Description |
|---|---|
GpsExportPaths
|
A |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If |
source_id_for_config(config)
¶
Build a stable source id for multi-project imports.
The source id is used to namespace identifiers when several projects or periods are loaded in one pass.
sample_raw_gps_data(raw, sample)
¶
Apply a raw-data sample before validation or transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw
|
RawGpsData
|
Raw GPS tables already loaded in memory. |
required |
sample
|
RawSampleConfig | None
|
Sampling configuration. If |
required |
Returns:
| Type | Description |
|---|---|
RawGpsData
|
A new |
Raises:
| Type | Description |
|---|---|
KeyError
|
If user-based sampling is requested and the user id column is missing. |
ValueError
|
If the sampling mode is unsupported. |
load_gps_export(config, *, sample=None, validate=True, must_exist=True)
¶
Load raw GPS CSV exports without transforming their schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ProjectConfig
|
Project configuration used to infer file paths and CSV separator. |
required |
sample
|
RawSampleConfig | None
|
Optional sampling strategy. |
None
|
validate
|
bool
|
When true, attach schema-validation reports to the returned object. |
True
|
must_exist
|
bool
|
When true, fail if expected CSV files are missing. |
True
|
Returns:
| Type | Description |
|---|---|
RawGpsData
|
Raw storyline, trips, journeys and optional user-statistics tables. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If required files are missing. |
KeyError
|
If user-based sampling references a missing user column. |
load_gps_source(config, *, source_id=None, sample=None, validate=True, must_exist=True, namespace_ids=True)
¶
Load one GPS source and add source metadata columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ProjectConfig
|
Project configuration for one project or period. |
required |
source_id
|
str | None
|
Optional source identifier. If absent, it is built from the configuration. |
None
|
sample
|
RawSampleConfig | None
|
Optional raw sampling strategy. |
None
|
validate
|
bool
|
Whether to attach validation reports. |
True
|
must_exist
|
bool
|
Whether expected files must exist. |
True
|
namespace_ids
|
bool
|
When true, prefix ids with |
True
|
Returns:
| Type | Description |
|---|---|
RawGpsData
|
Raw GPS tables with |
RawGpsData
|
optional |
concat_raw_gps_data(raws, *, validate=True)
¶
Concatenate several already-loaded raw GPS datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raws
|
Iterable[RawGpsData]
|
Raw datasets to concatenate. They should already contain source metadata if they come from different projects or periods. |
required |
validate
|
bool
|
Whether to validate the concatenated raw tables. |
True
|
Returns:
| Type | Description |
|---|---|
RawGpsData
|
One |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
load_gps_sources(configs, *, sample=None, validate=True, must_exist=True, namespace_ids=True)
¶
Load and concatenate raw GPS exports from several sources.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
configs
|
Iterable[ProjectConfig]
|
Project configurations, one per project or period. |
required |
sample
|
RawSampleConfig | None
|
Optional sampling strategy applied to each source. |
None
|
validate
|
bool
|
Whether to attach validation reports. |
True
|
must_exist
|
bool
|
Whether expected files must exist. |
True
|
namespace_ids
|
bool
|
Whether to prefix identifiers by source id. |
True
|
Returns:
| Type | Description |
|---|---|
RawGpsData
|
A concatenated |
load_sociodemo(path, *, user_id_column='Id')
¶
Load a sociodemographic side table and standardize the user id column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
CSV or Excel file path. |
required |
user_id_column
|
str
|
Column to rename to |
'Id'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A pandas table that can be passed to |
Données testset¶
xyt_gps.sample_data
¶
Demo sample-data helpers.
The default sample is a personal GPS storyline explicitly authorized for this package demo. It is loaded only when requested.
find_sample_gps_path(start=None)
¶
Find the authorized demo pickle from the current repo layout.
load_sample_gps(path=None, *, user_id='sample_user', max_rows=None, validate=True)
¶
Load the authorized personal demo storyline as a structured GPS raw dataset.
The source pickle contains a storyline table only. The function pseudonymizes
user_id by default and derives minimal Trips, Journeys and
UserStatistics tables so that tutorial transformations can run without a
full raw GPS export.
Données synthétiques¶
xyt_gps.synthetic
¶
Synthetic GPS testset generation from the authorized sample.
The generator is intentionally explicit and parameter-driven. With one authorized source user, the package cannot learn a defensible population model; it can, however, bootstrap realistic structured GPS days and inject controlled tracking anomalies for testing the transformation pipeline.
SyntheticExperiment
dataclass
¶
Experiment window used by the synthetic Declic generator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_name
|
str
|
Stable experiment identifier added to all generated
tables, for example |
required |
phases
|
tuple[Phase, ...]
|
Optional experiment phases. Empty phases are allowed for a generic GPS testset without analytical phase split. |
()
|
motiontag_project_name
|
str | None
|
Optional provider project name. Defaults to
|
None
|
SyntheticAnomalyRates
dataclass
¶
Controlled anomaly rates injected into synthetic raw tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
missing_geometry_rate
|
float
|
Share of storyline rows with a missing geometry.
Keep this below the raw schema tolerance when the generated dataset
must pass |
0.002
|
unconfirmed_rate
|
float
|
Share of rows where confirmation timestamps are cleared. |
0.2
|
mode_mismatch_rate
|
float
|
Share of track rows where |
0.06
|
extreme_length_rate
|
float
|
Share of track rows with inflated length values. |
0.01
|
SyntheticGpsDataset
dataclass
¶
Generated structured GPS testset and companion construction tables.
Attributes:
| Name | Type | Description |
|---|---|---|
raw |
RawGpsData
|
Raw GPS-like tables ready for validation or transformation. |
user_presence |
DataFrame
|
User-level construction table with experiment and phase windows. |
generation_manifest |
DataFrame
|
Summary of generated rows, users and active days. |
tables()
¶
Return all generated tables keyed by export name.
default_declic_synthetic_experiments()
¶
Return the five Declic experiment windows used for synthetic tests.
generate_synthetic_declic_gps(*, sample_path=None, experiments=None, users_per_experiment=50, random_state=42, anomaly_rates=None, validate=True)
¶
Generate a Declic-like synthetic GPS dataset from the sample.
The method uses bootstrap resampling of observed sample days, shifts them to the requested experiment phases, perturbs geometries and injects controlled anomalies. It is suited for tests, tutorials and pipeline validation. It is not a trained behavioral model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_path
|
str | Path | None
|
Optional path to the authorized sample pickle. If omitted, the package searches the repository layout. |
None
|
experiments
|
Iterable[SyntheticExperiment] | None
|
Experiment definitions. Defaults to prefiguration, waves 1-3 and ZIPLO. |
None
|
users_per_experiment
|
int
|
Number of synthetic users generated for each experiment. |
50
|
random_state
|
int | None
|
Seed used for reproducible generation. |
42
|
anomaly_rates
|
SyntheticAnomalyRates | None
|
Rates for missing geometry, unconfirmed rows, mode mismatch and extreme lengths. |
None
|
validate
|
bool
|
If true, attach raw schema validation reports to the output. |
True
|
Returns:
| Type | Description |
|---|---|
SyntheticGpsDataset
|
A |
SyntheticGpsDataset
|
|
write_synthetic_gps_dataset(dataset, output_dir, *, formats=('parquet',), overwrite=True)
¶
Write a generated synthetic GPS dataset as landing-ready files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
SyntheticGpsDataset
|
Synthetic dataset produced by
|
required |
output_dir
|
str | Path
|
Destination directory. |
required |
formats
|
Iterable[str]
|
Formats for large event tables. Supported values are |
('parquet',)
|
overwrite
|
bool
|
If false, fail when a target file already exists. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A manifest with one row per written file. |
Schémas¶
xyt_gps.schema
¶
Schema validation for raw GPS mobility exports.
SchemaSpec
dataclass
¶
Column contract for one raw GPS input table.
expected_gps_schema()
¶
Return the expected GPS import schema as an inspectable table.
The current schema describes the structured GPS multi-table contract used by the package: storyline, trips, journeys and user statistics. It is intentionally generic at the public API level because the landing step may adapt source-specific files and column names before data loading.
check_raw_import_columns(storyline, user_statistics=None, *, trips=None, journeys=None, include_recommended=True, raise_on_error=False)
¶
Check the raw column structure before building RawGpsData.
This is a lightweight, notebook-friendly check. It focuses on the column names expected at the raw import stage. It does not parse dates, geometries or modes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Raw storyline table to check. |
required |
user_statistics
|
DataFrame | None
|
Optional raw user statistics table to check. |
None
|
trips
|
DataFrame | None
|
Optional raw trips table to check. |
None
|
journeys
|
DataFrame | None
|
Optional raw journeys table to check. |
None
|
include_recommended
|
bool
|
Include non-blocking recommended columns in the report. |
True
|
raise_on_error
|
bool
|
Raise a |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame with one row per expected column. Important columns are: |
DataFrame
|
|
validate_schema(df, spec, *, allow_extra_columns=True)
¶
Validate column presence and basic null rates without changing data.
validate_gps_raw(raw)
¶
Validate all raw GPS tables that are present.
Transformations¶
xyt_gps.transform
¶
Orchestration from structured GPS exports to mobility tables.
prepare_mobility_dataset(raw, config, *, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validation=None)
¶
Run the transparent preparation workflow.
The function orchestrates validation, storyline parsing, trip and journey preparation, mappings, split into staypoints and legs, GPS quality flags, user tracking stats and relation tables. Each step is also exposed as a smaller public function so notebooks can inspect intermediate states.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw
|
RawGpsData
|
Raw GPS tables loaded with |
required |
config
|
ProjectConfig
|
Project configuration controlling mappings, phases, thresholds and CRS. |
required |
sociodemo
|
DataFrame | None
|
Optional user-level side table with a |
None
|
weights
|
DataFrame | None
|
Optional user-level weighting table with |
None
|
weight_col
|
str
|
Name of the weight column to merge/create. |
'weight'
|
default_weight
|
float
|
Weight used when no weighting table is provided, or when a user has no matching weight. |
1.0
|
resample_missing_days
|
bool
|
When true, insert transparent
|
False
|
clean_leg_geometries
|
bool
|
When true, convert continuous |
True
|
add_length_outlier_flags
|
bool
|
When true, add per-mode length quantile flags to legs. |
True
|
add_signal_quality_flags
|
bool
|
When true, compute GPS signal-loss metrics on legs and user-level signal-quality flags. |
True
|
validation
|
dict[str, SchemaValidationResult] | None
|
Optional validation reports. If absent, reports attached
to |
None
|
Returns:
| Type | Description |
|---|---|
MobilityDataset
|
A |
MobilityDataset
|
trips, journeys, user stats and mapping tables. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If raw schema validation contains blocking errors. |
concat_mobility_datasets(datasets)
¶
Concatenate transformed mobility datasets from several sources.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets
|
Iterable[MobilityDataset]
|
Already transformed datasets, usually one per project or period. |
required |
Returns:
| Type | Description |
|---|---|
MobilityDataset
|
One |
MobilityDataset
|
validation reports. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no dataset is provided. |
prepare_mobility_datasets(configs, *, sociodemo_by_source=None, weights_by_source=None, weight_col='weight', default_weight=1.0, sample=None, resample_missing_days=False, clean_leg_geometries=True, add_length_outlier_flags=True, add_signal_quality_flags=True, validate=True, must_exist=True, namespace_ids=True)
¶
Load, transform and concatenate several GPS sources or periods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
configs
|
Iterable[ProjectConfig]
|
Project configurations, one per project or period. |
required |
sociodemo_by_source
|
DataFrame | Mapping[str, DataFrame] | None
|
Optional sociodemographic table or mapping from source id/project name to a sociodemographic table. |
None
|
weights_by_source
|
DataFrame | Mapping[str, DataFrame] | None
|
Optional weighting table or mapping from source id/project name to a user-level weighting table. |
None
|
weight_col
|
str
|
Name of the weight column to merge/create. |
'weight'
|
default_weight
|
float
|
Weight used when no weighting table is provided, or when a user has no matching weight. |
1.0
|
sample
|
RawSampleConfig | None
|
Optional raw sampling strategy applied to each source. |
None
|
resample_missing_days
|
bool
|
Whether to add transparent missing-day stays. |
False
|
clean_leg_geometries
|
bool
|
Whether to normalize/drop problematic leg geometries before length and quality calculations. |
True
|
add_length_outlier_flags
|
bool
|
Whether to add per-mode length outlier flags. |
True
|
add_signal_quality_flags
|
bool
|
Whether to compute GPS signal-loss flags. |
True
|
validate
|
bool
|
Whether to validate raw exports before transformation. |
True
|
must_exist
|
bool
|
Whether expected raw CSV files must exist. |
True
|
namespace_ids
|
bool
|
Whether to prefix ids by source id before concatenation. |
True
|
Returns:
| Type | Description |
|---|---|
MobilityDataset
|
A concatenated |
Parsing¶
xyt_gps.parsing
¶
Parsing helpers for structured GPS exports.
drop_nans_if_low_rate(df, column, *, threshold=0.01)
¶
Drop nulls only when the null rate is explicitly below the threshold.
parse_ewkb(value)
¶
Parse an EWKB hex string into a shapely geometry.
parse_date_columns(df, columns, *, utc=True)
¶
Parse existing date columns and leave absent columns untouched.
assign_phase(value, phases, *, default='Other')
¶
Assign a date or timestamp to the configured experimental phase.
Préparation des tables¶
xyt_gps.prepare_tables
¶
Prepare raw GPS tables before mobility object construction.
apply_storyline_mappings(storyline, mappings=None)
¶
Add purpose and mode aggregation columns to storyline rows.
apply_trip_journey_mappings(trips, journeys, mappings=None)
¶
Add purpose and mode aggregation columns to trips and journeys.
prepare_storyline(storyline, config, *, drop_nan_threshold=0.01)
¶
Validate, parse geometry/dates, assign phases and map modes/purposes.
prepare_trips(trips, config)
¶
Validate and parse raw trips.
prepare_journeys(journeys, config)
¶
Validate and parse raw journeys.
Tables de mobilité¶
xyt_gps.mobility_tables
¶
Small mobility-table transformations used by the preparation workflow.
split_storyline(storyline)
¶
Split a parsed storyline into staypoints and legs.
add_user_id_day(legs)
¶
Add the person-day id used by downstream indicators and notebooks.
add_length_quantile_flags(legs, *, group_col='mode', length_col='length', quantiles=(0.98, 0.99))
¶
Flag unusually long legs within each mode group.
Relations entre tables¶
xyt_gps.relations
¶
Relation tables linking legs, staypoints, trips and journeys.
build_track_trip_journey_map(legs, trips, journeys, *, tolerance='5s')
¶
Map legs to trips and journeys.
If legs already contain a valid trip_id, the function uses it directly.
Otherwise it matches each leg midpoint to trips from the same user, then
matches each trip midpoint to journeys from the same user. Temporal joins
are vectorized by user to avoid per-leg nested loops.
build_legs_staypoints_map(legs, staypoints, *, tolerance='5s')
¶
Map each staypoint to previous and next legs using near-identical timestamps.
add_journey_to_trips(trips, journeys, mapping)
¶
Add journey_id and journey purpose to trips.
add_trip_destination_activity(trips, map_track_trip_journey, map_legs_staypoints)
¶
Add leading activity id to trips using the last leg of each trip.
add_excursion_flags_to_trips_journeys(trips, journeys, legs, map_track_trip_journey, *, excursion_col='excursion')
¶
Propagate leg-level excursion flags to trips and journeys.
Statistiques utilisateurs¶
xyt_gps.user_stats
¶
User-level statistics derived from prepared GPS tables.
add_excursion_stats_to_user_stats(user_stats, legs, *, excursion_col='excursion')
¶
Merge user-level excursion counts from leg-level flags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table containing |
required |
legs
|
DataFrame
|
Leg table containing |
required |
excursion_col
|
str
|
Name of the leg-level excursion flag. |
'excursion'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
DataFrame
|
|
build_user_stats(storyline, config, *, user_statistics=None, sociodemo=None, weights=None, weight_col='weight', default_weight=1.0)
¶
Build user-level tracking stats and merge optional side tables.
The output keeps the GPS database readable at user level. When the
storyline contains artificial Resampled_stay rows, the table reports both
the continuous record (active_days_count) and the observed tracking days
before resampling (observed_active_days_count). If no weights are
provided, weight_col is set to default_weight.
Qualité du suivi¶
xyt_gps.quality
¶
Quality public facade for the data preparation workflow.
build_daily_tracking_presence(storyline, config=None, *, user_id_column='user_id', date_column='started_at', exclude_resampled=True)
¶
Build one observed tracking row per user and date.
This table is intentionally based on observed rows before temporal
resampling. Otherwise, artificial Resampled_stay rows would make missing
days look like tracked days in participation analyses.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
config
|
ProjectConfig | None
|
Optional project configuration used to assign phases. |
None
|
user_id_column
|
str
|
User identifier column. |
'user_id'
|
date_column
|
str
|
Datetime column used to define active tracking days. |
'started_at'
|
exclude_resampled
|
bool
|
When true, rows where |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A table with |
DataFrame
|
config with phases is provided, |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the requested user or date column is missing. |
build_weekly_participation_grid(storyline, config, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')
¶
Count observed active tracking days by user and protocol week.
When phases are configured, weeks are relative to these phases rather than
ISO calendar weeks. Without configured phases, the grid uses one analytical
period named default_period_name and spans the observed tracking dates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table, preferably before resampling. |
required |
config
|
ProjectConfig
|
Project configuration containing phase dates. |
required |
user_ids
|
Iterable | None
|
Optional full list of expected users. If omitted, users are
taken from |
None
|
user_id_column
|
str
|
User identifier column in |
'user_id'
|
date_column
|
str
|
Datetime column used to define active tracking days. |
'started_at'
|
exclude_resampled
|
bool
|
When true, artificial |
True
|
default_period_name
|
str
|
Name used in the output when no phase is configured. |
'All'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A complete user x phase-week table. |
DataFrame
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If no phase is configured and no tracking date is available to infer a default analysis period. |
calculate_user_tracking_stats(storyline)
¶
Calculate observed tracking windows and missing days per user.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. It must contain
|
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A user-level table with first and latest tracked dates, the number |
DataFrame
|
of active days, inactive days, maximum gap between tracked days and |
DataFrame
|
tracking completeness. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
summarize_participation_grid(participation_grid, *, good_week_min_days=5)
¶
Summarize weekly participation coverage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participation_grid
|
DataFrame
|
Table produced by
|
required |
good_week_min_days
|
int
|
Minimum active days for a week to be considered good or complete. |
5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One summary row for all phases and one row per phase. |
build_tracking_gap_report(storyline, config=None, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')
¶
Summarize observed, missing and consecutive tracking days.
This is the package version of the useful generic part of the historical tracking-quality notebook: one row per user and analytical period, with active days, missing days, maximum gap and maximum consecutive observed days.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
config
|
ProjectConfig | None
|
Optional project configuration. When phases are configured,
the report is computed by phase; otherwise it uses the observed
period named |
None
|
user_ids
|
Iterable | None
|
Optional full list of expected users. |
None
|
user_id_column
|
str
|
User identifier column in |
'user_id'
|
date_column
|
str
|
Datetime column used to define active tracking days. |
'started_at'
|
exclude_resampled
|
bool
|
Whether to ignore artificial |
True
|
default_period_name
|
str
|
Period name used when no phase is configured. |
'All'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A user-period table with tracking coverage and gap metrics. |
build_tracking_quality_report(user_stats)
¶
Build a compact one-row report for user tracking quality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table with tracking-quality columns. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A one-row table with user counts, valid-user share and median |
DataFrame
|
tracking-duration indicators. The output is meant for quick notebook |
DataFrame
|
checks before applying filters. |
calculate_tracking_periods(user_stats, config, *, min_days_by_phase=None)
¶
Calculate effective tracked days per configured phase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level tracking table, usually produced by
|
required |
config
|
ProjectConfig
|
Project configuration containing |
required |
min_days_by_phase
|
Mapping[str, int] | None
|
Optional override for minimum tracked days by phase name. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
DataFrame
|
|
DataFrame
|
preserved. |
flag_tracking_quality(user_stats, config)
¶
Add transparent tracking-quality flags to a user table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table with at least |
required |
config
|
ProjectConfig
|
Project configuration containing total and phase-level tracking thresholds. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
DataFrame
|
|
DataFrame
|
and the categorical reason |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
summarize_phase_tracking(user_stats, config, *, day_thresholds=(7, 14, 21))
¶
Summarize user tracking coverage by configured phase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table containing |
required |
config
|
ProjectConfig
|
Project configuration containing phase dates. |
required |
day_thresholds
|
Iterable[int]
|
Day-count thresholds to report, for example 7, 14 and 21 days. |
(7, 14, 21)
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per phase with user counts above each threshold and mean |
DataFrame
|
coverage over the theoretical phase duration. |
build_mode_detection_precision(storyline, *, mode_col='mode', detected_mode_col='detected_mode', type_col='type', confirmed_at_col='confirmed_at', confirmed_only=True, group_cols=('mode', 'mode_niv1', 'mode_mrmt'))
¶
Compare detected and confirmed transport modes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
mode_col
|
str
|
Confirmed or corrected mode column. |
'mode'
|
detected_mode_col
|
str
|
Detected mode column. |
'detected_mode'
|
type_col
|
str
|
Column used to keep only track rows. |
'type'
|
confirmed_at_col
|
str
|
Confirmation timestamp column. |
'confirmed_at'
|
confirmed_only
|
bool
|
When true, keep only rows with a confirmation timestamp when the column exists. |
True
|
group_cols
|
Iterable[str]
|
Mode columns to summarize when present. |
('mode', 'mode_niv1', 'mode_mrmt')
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A long table with one precision row per available grouping column and |
DataFrame
|
label. |
build_user_confirmation_rates(storyline, *, user_id_col='user_id', type_col='type', confirmed_at_col='confirmed_at')
¶
Calculate user-level confirmation rates for stays and tracks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
user_id_col
|
str
|
User identifier column. |
'user_id'
|
type_col
|
str
|
Column distinguishing |
'type'
|
confirmed_at_col
|
str
|
Confirmation timestamp column. |
'confirmed_at'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per user with stay and track confirmation counts and rates. |
get_extreme_legs_by_mode(legs, *, mode_col='mode_niv1', length_col='length', duration_col='duration', top_n=5)
¶
Return the longest legs within each mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
legs
|
DataFrame
|
Leg table. |
required |
mode_col
|
str
|
Mode column used for grouping. |
'mode_niv1'
|
length_col
|
str
|
Length column, expected in meters. |
'length'
|
duration_col
|
str
|
Optional duration column in seconds. |
'duration'
|
top_n
|
int
|
Number of longest legs retained per mode. |
5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A table sorted by mode and descending distance, with distance in |
DataFrame
|
kilometers and speed when duration is available. |
summarize_leg_lengths_by_mode(legs, *, mode_col='mode_niv2', length_col='length', quantiles=(0.95, 0.98, 0.99))
¶
Summarize leg-distance distributions by mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
legs
|
DataFrame
|
Leg table. |
required |
mode_col
|
str
|
Mode column used for grouping. |
'mode_niv2'
|
length_col
|
str
|
Length column, expected in meters. |
'length'
|
quantiles
|
Iterable[float]
|
Quantiles to add to the summary. |
(0.95, 0.98, 0.99)
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per mode with distances expressed in kilometers. |
resample_missing_stays(storyline, config)
¶
Create transparent placeholder stays for missing tracking dates.
The generated rows are labelled Resampled_stay. They are useful when
the analysis requires a continuous user calendar, but they remain
identifiable through their type and comment_feedback values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
GeoDataFrame
|
Parsed storyline table with |
required |
config
|
ProjectConfig
|
Project configuration used to assign experimental phases to inserted days. |
required |
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
A tuple |
DataFrame
|
optional placeholder rows and |
tuple[GeoDataFrame, DataFrame]
|
inserted user-day. |
build_user_selection_table(user_stats, *, require_tracking_quality=False, exclude_bad_signal_users=True, max_low_quality_legs_share=None)
¶
Build an explicit user-level table for selecting analysis users.
The function does not remove rows. It adds one boolean column per rule,
a final analysis_user_ok flag and an analysis_user_reason column.
This keeps exclusions inspectable before any table is filtered. The name
deliberately uses selection_table rather than filter_matrix because
the object is meant to be read by humans before being applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table produced by |
required |
require_tracking_quality
|
bool
|
When true, keep only users with
|
False
|
exclude_bad_signal_users
|
bool
|
When true, exclude users flagged by the signal-quality step. |
True
|
max_low_quality_legs_share
|
float | None
|
Optional maximum share of level-1 low-quality legs tolerated per user. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
DataFrame
|
|
Raises:
| Type | Description |
|---|---|
KeyError
|
If a column required by an enabled rule is missing. |
ValueError
|
If |
filter_mobility_dataset_by_users(dataset, user_ids)
¶
Filter every user-indexed table of a MobilityDataset.
Mapping tables are filtered after the core tables so they only reference retained legs, trips, journeys and staypoints. The function does not recompute indicators; it only preserves relational consistency after a user selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
MobilityDataset
|
Transformed mobility dataset. |
required |
user_ids
|
Iterable
|
Iterable of user identifiers to keep. |
required |
Returns:
| Type | Description |
|---|---|
MobilityDataset
|
A new |
MobilityDataset
|
reports. |
filter_table_by_users(df, user_ids, *, user_id_column='user_id')
¶
Filter any mobility table by selected users while keeping its schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Table to filter. If |
required |
user_ids
|
Iterable
|
Iterable of user identifiers to keep. |
required |
user_id_column
|
str
|
Name of the user identifier column in |
'user_id'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
select_analysis_users(selection_table, *, quality_column='analysis_user_ok')
¶
Return user ids selected by a user selection table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
selection_table
|
DataFrame
|
Table produced by |
required |
quality_column
|
str
|
Boolean column used as the final selection flag. |
'analysis_user_ok'
|
Returns:
| Type | Description |
|---|---|
Index
|
User ids for which |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
select_valid_tracking_users(user_stats, *, quality_column='tracking_quality_ok')
¶
Return user ids that pass one quality column.
This helper is intentionally narrow and remains useful for quick checks.
For analysis filters that combine tracking quality and GPS signal flags,
prefer build_user_selection_table followed by select_analysis_users.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table containing |
required |
quality_column
|
str
|
Boolean column used as the selection criterion. |
'tracking_quality_ok'
|
Returns:
| Type | Description |
|---|---|
Index
|
User ids for which |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the requested quality column is missing. |
Présence et participation¶
xyt_gps.quality_tracking
¶
Daily presence and weekly participation helpers.
calculate_user_tracking_stats(storyline)
¶
Calculate observed tracking windows and missing days per user.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. It must contain
|
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A user-level table with first and latest tracked dates, the number |
DataFrame
|
of active days, inactive days, maximum gap between tracked days and |
DataFrame
|
tracking completeness. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
build_daily_tracking_presence(storyline, config=None, *, user_id_column='user_id', date_column='started_at', exclude_resampled=True)
¶
Build one observed tracking row per user and date.
This table is intentionally based on observed rows before temporal
resampling. Otherwise, artificial Resampled_stay rows would make missing
days look like tracked days in participation analyses.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
config
|
ProjectConfig | None
|
Optional project configuration used to assign phases. |
None
|
user_id_column
|
str
|
User identifier column. |
'user_id'
|
date_column
|
str
|
Datetime column used to define active tracking days. |
'started_at'
|
exclude_resampled
|
bool
|
When true, rows where |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A table with |
DataFrame
|
config with phases is provided, |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the requested user or date column is missing. |
build_weekly_participation_grid(storyline, config, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')
¶
Count observed active tracking days by user and protocol week.
When phases are configured, weeks are relative to these phases rather than
ISO calendar weeks. Without configured phases, the grid uses one analytical
period named default_period_name and spans the observed tracking dates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table, preferably before resampling. |
required |
config
|
ProjectConfig
|
Project configuration containing phase dates. |
required |
user_ids
|
Iterable | None
|
Optional full list of expected users. If omitted, users are
taken from |
None
|
user_id_column
|
str
|
User identifier column in |
'user_id'
|
date_column
|
str
|
Datetime column used to define active tracking days. |
'started_at'
|
exclude_resampled
|
bool
|
When true, artificial |
True
|
default_period_name
|
str
|
Name used in the output when no phase is configured. |
'All'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A complete user x phase-week table. |
DataFrame
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If no phase is configured and no tracking date is available to infer a default analysis period. |
summarize_participation_grid(participation_grid, *, good_week_min_days=5)
¶
Summarize weekly participation coverage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participation_grid
|
DataFrame
|
Table produced by
|
required |
good_week_min_days
|
int
|
Minimum active days for a week to be considered good or complete. |
5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One summary row for all phases and one row per phase. |
Rapports qualité¶
xyt_gps.quality_reports
¶
Tracking quality reports and user-level tracking flags.
build_tracking_gap_report(storyline, config=None, *, user_ids=None, user_id_column='user_id', date_column='started_at', exclude_resampled=True, default_period_name='All')
¶
Summarize observed, missing and consecutive tracking days.
This is the package version of the useful generic part of the historical tracking-quality notebook: one row per user and analytical period, with active days, missing days, maximum gap and maximum consecutive observed days.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
config
|
ProjectConfig | None
|
Optional project configuration. When phases are configured,
the report is computed by phase; otherwise it uses the observed
period named |
None
|
user_ids
|
Iterable | None
|
Optional full list of expected users. |
None
|
user_id_column
|
str
|
User identifier column in |
'user_id'
|
date_column
|
str
|
Datetime column used to define active tracking days. |
'started_at'
|
exclude_resampled
|
bool
|
Whether to ignore artificial |
True
|
default_period_name
|
str
|
Period name used when no phase is configured. |
'All'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A user-period table with tracking coverage and gap metrics. |
summarize_phase_tracking(user_stats, config, *, day_thresholds=(7, 14, 21))
¶
Summarize user tracking coverage by configured phase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table containing |
required |
config
|
ProjectConfig
|
Project configuration containing phase dates. |
required |
day_thresholds
|
Iterable[int]
|
Day-count thresholds to report, for example 7, 14 and 21 days. |
(7, 14, 21)
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per phase with user counts above each threshold and mean |
DataFrame
|
coverage over the theoretical phase duration. |
calculate_tracking_periods(user_stats, config, *, min_days_by_phase=None)
¶
Calculate effective tracked days per configured phase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level tracking table, usually produced by
|
required |
config
|
ProjectConfig
|
Project configuration containing |
required |
min_days_by_phase
|
Mapping[str, int] | None
|
Optional override for minimum tracked days by phase name. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
DataFrame
|
|
DataFrame
|
preserved. |
flag_tracking_quality(user_stats, config)
¶
Add transparent tracking-quality flags to a user table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table with at least |
required |
config
|
ProjectConfig
|
Project configuration containing total and phase-level tracking thresholds. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
DataFrame
|
|
DataFrame
|
and the categorical reason |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
build_tracking_quality_report(user_stats)
¶
Build a compact one-row report for user tracking quality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table with tracking-quality columns. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A one-row table with user counts, valid-user share and median |
DataFrame
|
tracking-duration indicators. The output is meant for quick notebook |
DataFrame
|
checks before applying filters. |
Diagnostics¶
xyt_gps.quality_diagnostics
¶
Diagnostic reports for leg lengths, confirmations and mode detection.
summarize_leg_lengths_by_mode(legs, *, mode_col='mode_niv2', length_col='length', quantiles=(0.95, 0.98, 0.99))
¶
Summarize leg-distance distributions by mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
legs
|
DataFrame
|
Leg table. |
required |
mode_col
|
str
|
Mode column used for grouping. |
'mode_niv2'
|
length_col
|
str
|
Length column, expected in meters. |
'length'
|
quantiles
|
Iterable[float]
|
Quantiles to add to the summary. |
(0.95, 0.98, 0.99)
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per mode with distances expressed in kilometers. |
get_extreme_legs_by_mode(legs, *, mode_col='mode_niv1', length_col='length', duration_col='duration', top_n=5)
¶
Return the longest legs within each mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
legs
|
DataFrame
|
Leg table. |
required |
mode_col
|
str
|
Mode column used for grouping. |
'mode_niv1'
|
length_col
|
str
|
Length column, expected in meters. |
'length'
|
duration_col
|
str
|
Optional duration column in seconds. |
'duration'
|
top_n
|
int
|
Number of longest legs retained per mode. |
5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A table sorted by mode and descending distance, with distance in |
DataFrame
|
kilometers and speed when duration is available. |
build_user_confirmation_rates(storyline, *, user_id_col='user_id', type_col='type', confirmed_at_col='confirmed_at')
¶
Calculate user-level confirmation rates for stays and tracks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
user_id_col
|
str
|
User identifier column. |
'user_id'
|
type_col
|
str
|
Column distinguishing |
'type'
|
confirmed_at_col
|
str
|
Confirmation timestamp column. |
'confirmed_at'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per user with stay and track confirmation counts and rates. |
build_mode_detection_precision(storyline, *, mode_col='mode', detected_mode_col='detected_mode', type_col='type', confirmed_at_col='confirmed_at', confirmed_only=True, group_cols=('mode', 'mode_niv1', 'mode_mrmt'))
¶
Compare detected and confirmed transport modes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
DataFrame
|
Parsed storyline table. |
required |
mode_col
|
str
|
Confirmed or corrected mode column. |
'mode'
|
detected_mode_col
|
str
|
Detected mode column. |
'detected_mode'
|
type_col
|
str
|
Column used to keep only track rows. |
'type'
|
confirmed_at_col
|
str
|
Confirmation timestamp column. |
'confirmed_at'
|
confirmed_only
|
bool
|
When true, keep only rows with a confirmation timestamp when the column exists. |
True
|
group_cols
|
Iterable[str]
|
Mode columns to summarize when present. |
('mode', 'mode_niv1', 'mode_mrmt')
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A long table with one precision row per available grouping column and |
DataFrame
|
label. |
Resampling temporel¶
xyt_gps.quality_resampling
¶
Temporal resampling helpers for missing tracking days.
resample_missing_stays(storyline, config)
¶
Create transparent placeholder stays for missing tracking dates.
The generated rows are labelled Resampled_stay. They are useful when
the analysis requires a continuous user calendar, but they remain
identifiable through their type and comment_feedback values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storyline
|
GeoDataFrame
|
Parsed storyline table with |
required |
config
|
ProjectConfig
|
Project configuration used to assign experimental phases to inserted days. |
required |
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
A tuple |
DataFrame
|
optional placeholder rows and |
tuple[GeoDataFrame, DataFrame]
|
inserted user-day. |
Sélection utilisateurs¶
xyt_gps.quality_selection
¶
User selection and consistent filtering of mobility datasets.
select_valid_tracking_users(user_stats, *, quality_column='tracking_quality_ok')
¶
Return user ids that pass one quality column.
This helper is intentionally narrow and remains useful for quick checks.
For analysis filters that combine tracking quality and GPS signal flags,
prefer build_user_selection_table followed by select_analysis_users.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table containing |
required |
quality_column
|
str
|
Boolean column used as the selection criterion. |
'tracking_quality_ok'
|
Returns:
| Type | Description |
|---|---|
Index
|
User ids for which |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the requested quality column is missing. |
filter_table_by_users(df, user_ids, *, user_id_column='user_id')
¶
Filter any mobility table by selected users while keeping its schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Table to filter. If |
required |
user_ids
|
Iterable
|
Iterable of user identifiers to keep. |
required |
user_id_column
|
str
|
Name of the user identifier column in |
'user_id'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
build_user_selection_table(user_stats, *, require_tracking_quality=False, exclude_bad_signal_users=True, max_low_quality_legs_share=None)
¶
Build an explicit user-level table for selecting analysis users.
The function does not remove rows. It adds one boolean column per rule,
a final analysis_user_ok flag and an analysis_user_reason column.
This keeps exclusions inspectable before any table is filtered. The name
deliberately uses selection_table rather than filter_matrix because
the object is meant to be read by humans before being applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_stats
|
DataFrame
|
User-level table produced by |
required |
require_tracking_quality
|
bool
|
When true, keep only users with
|
False
|
exclude_bad_signal_users
|
bool
|
When true, exclude users flagged by the signal-quality step. |
True
|
max_low_quality_legs_share
|
float | None
|
Optional maximum share of level-1 low-quality legs tolerated per user. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
DataFrame
|
|
Raises:
| Type | Description |
|---|---|
KeyError
|
If a column required by an enabled rule is missing. |
ValueError
|
If |
select_analysis_users(selection_table, *, quality_column='analysis_user_ok')
¶
Return user ids selected by a user selection table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
selection_table
|
DataFrame
|
Table produced by |
required |
quality_column
|
str
|
Boolean column used as the final selection flag. |
'analysis_user_ok'
|
Returns:
| Type | Description |
|---|---|
Index
|
User ids for which |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
filter_mobility_dataset_by_users(dataset, user_ids)
¶
Filter every user-indexed table of a MobilityDataset.
Mapping tables are filtered after the core tables so they only reference retained legs, trips, journeys and staypoints. The function does not recompute indicators; it only preserves relational consistency after a user selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
MobilityDataset
|
Transformed mobility dataset. |
required |
user_ids
|
Iterable
|
Iterable of user identifiers to keep. |
required |
Returns:
| Type | Description |
|---|---|
MobilityDataset
|
A new |
MobilityDataset
|
reports. |
Spatial¶
xyt_gps.spatial
¶
Spatial façade for geometry, GPS signal quality and zone helpers.
clean_leg_geometries(legs, *, drop_discontinuous=True)
¶
Convert continuous MultiLineString legs to LineString.
max_consecutive_point_distance(geometry)
¶
Return the maximum distance between consecutive points.
add_signal_loss_metrics(legs, config, *, geometry_col='geometry')
¶
Add absolute and relative GPS signal-loss metrics to legs.
add_signal_quality_flags(legs, config, *, mode_col=None)
¶
Add signal-loss metrics, leg flags and user signal-quality flags.
add_signal_quality_to_user_stats(user_stats, legs, *, signal_quality_computed=None)
¶
Merge user-level signal-quality metrics into user_stats.
build_user_signal_quality_stats(legs)
¶
Aggregate signal-quality flags at user level.
flag_low_quality_legs_by_mode(legs, config, *, mode_col=None)
¶
Flag legs with mode-specific signal-loss thresholds.
identify_bad_signal_users(legs, *, quantile_threshold=0.995, user_id_col='user_id')
¶
Identify users with unusually high average signal loss.
add_excursion_flags(table, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', flag_col='excursion', mode='outside', target_crs='EPSG:4326', operations_crs='EPSG:2056')
¶
Flag geographic excursions on a geometry table.
add_journey_origin_destination_from_trips(journeys, trips, map_track_trip_journey, *, trip_origin_col='trip_origin_zone', trip_destination_col='trip_destination_zone', output_origin_col='journey_origin_zone', output_destination_col='journey_destination_zone', fill_value='Unknown')
¶
Propagate first-origin and last-destination labels from trips to journeys.
add_leg_origin_destination_zones(legs, zones, *, zone_label_col, origin_col='origin_zone', destination_col='destination_zone', geometry_col='geometry', fill_value='Outside', target_crs='EPSG:4326')
¶
Label leg origins and destinations from a zone layer.
add_spatial_zone_labels(table, zones, *, zone_label_col, output_col, geometry_col='geometry', predicate='within', fill_value='Unknown', target_crs='EPSG:4326')
¶
Add a zone label to a geometry table by spatial join.
add_trip_origin_destination_from_legs(trips, legs, map_track_trip_journey, *, leg_origin_col='origin_zone', leg_destination_col='destination_zone', output_origin_col='trip_origin_zone', output_destination_col='trip_destination_zone', fill_value='Unknown')
¶
Propagate first-origin and last-destination labels from legs to trips.
classify_leg_relation_to_area(legs, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', relation_col='area_relation', code_col=None, target_crs='EPSG:4326', operations_crs='EPSG:2056')
¶
Classify each leg as intra, extra or exchange relative to an area.
Géométries spatiales¶
xyt_gps.spatial_geometry
¶
Qualité GPS spatiale¶
xyt_gps.spatial_quality
¶
GPS signal-quality metrics and filters.
add_signal_loss_metrics(legs, config, *, geometry_col='geometry')
¶
Add absolute and relative GPS signal-loss metrics to legs.
flag_low_quality_legs_by_mode(legs, config, *, mode_col=None)
¶
Flag legs with mode-specific signal-loss thresholds.
identify_bad_signal_users(legs, *, quantile_threshold=0.995, user_id_col='user_id')
¶
Identify users with unusually high average signal loss.
add_signal_quality_flags(legs, config, *, mode_col=None)
¶
Add signal-loss metrics, leg flags and user signal-quality flags.
build_user_signal_quality_stats(legs)
¶
Aggregate signal-quality flags at user level.
add_signal_quality_to_user_stats(user_stats, legs, *, signal_quality_computed=None)
¶
Merge user-level signal-quality metrics into user_stats.
Zones et relations spatiales¶
xyt_gps.spatial_zones
¶
Spatial labels, area relations and excursion flags.
add_excursion_flags(table, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', flag_col='excursion', mode='outside', target_crs='EPSG:4326', operations_crs='EPSG:2056')
¶
Flag geographic excursions on a geometry table.
add_spatial_zone_labels(table, zones, *, zone_label_col, output_col, geometry_col='geometry', predicate='within', fill_value='Unknown', target_crs='EPSG:4326')
¶
Add a zone label to a geometry table by spatial join.
add_leg_origin_destination_zones(legs, zones, *, zone_label_col, origin_col='origin_zone', destination_col='destination_zone', geometry_col='geometry', fill_value='Outside', target_crs='EPSG:4326')
¶
Label leg origins and destinations from a zone layer.
classify_leg_relation_to_area(legs, *, area=None, area_path=None, center=None, radius_m=None, geometry_col='geometry', relation_col='area_relation', code_col=None, target_crs='EPSG:4326', operations_crs='EPSG:2056')
¶
Classify each leg as intra, extra or exchange relative to an area.
add_trip_origin_destination_from_legs(trips, legs, map_track_trip_journey, *, leg_origin_col='origin_zone', leg_destination_col='destination_zone', output_origin_col='trip_origin_zone', output_destination_col='trip_destination_zone', fill_value='Unknown')
¶
Propagate first-origin and last-destination labels from legs to trips.
add_journey_origin_destination_from_trips(journeys, trips, map_track_trip_journey, *, trip_origin_col='trip_origin_zone', trip_destination_col='trip_destination_zone', output_origin_col='journey_origin_zone', output_destination_col='journey_destination_zone', fill_value='Unknown')
¶
Propagate first-origin and last-destination labels from trips to journeys.
Tranches horaires¶
xyt_gps.spatial_time
¶
Reusable time-slice helpers for mobility tables.
add_time_slices(table, *, time_slices=None, datetime_col='started_at', output_col='time_slice', timezone='Europe/Zurich', fallback_label='HC')
¶
Add a reusable daily time-slice label to a mobility table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DataFrame
|
Input table with a datetime column. |
required |
time_slices
|
Iterable[TimeSlice] | None
|
Named intervals. Defaults to |
None
|
datetime_col
|
str
|
Datetime column used to classify observations. |
'started_at'
|
output_col
|
str
|
Name of the created column. |
'time_slice'
|
timezone
|
str | None
|
Local timezone used before extracting the hour. Set to
|
'Europe/Zurich'
|
fallback_label
|
str
|
Label assigned outside configured intervals. |
'HC'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
H3¶
xyt_gps.spatial_h3
¶
H3 point extraction and aggregation helpers.
legs_to_h3_points(legs, *, h3_resolution=9, geometry_col='geometry', metadata_cols=None, sample_distance_m=None, max_points_per_leg=None)
¶
Convert leg geometries into H3-indexed point observations.
The function creates one row per point extracted from each leg geometry.
By default, points are the vertices already present in the leg LineString.
When sample_distance_m is provided, the line is sampled at a regular
interval before H3 indexing. This is useful for fréquentation maps, but it
should be documented as a sampled representation rather than raw GPS
observations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
legs
|
DataFrame
|
Leg table, usually |
required |
h3_resolution
|
int | Iterable[int]
|
H3 resolution or list of resolutions, from 0 to 15. Higher values produce smaller hexagons and larger output tables. |
9
|
geometry_col
|
str
|
Geometry column containing |
'geometry'
|
metadata_cols
|
Iterable[str] | None
|
Columns to copy from legs to each point row. When omitted, common mobility identifiers, dates, modes and phases are copied when present. |
None
|
sample_distance_m
|
float | None
|
Optional regular sampling distance in metres. If omitted, existing line vertices are used. |
None
|
max_points_per_leg
|
int | None
|
Optional cap after vertex extraction or regular sampling. This is a safeguard for very dense geometries. |
None
|
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
GeoDataFrame in EPSG:4326 with |
GeoDataFrame
|
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If the optional |
KeyError
|
If the geometry column is missing. |
ValueError
|
If the H3 resolution or sampling distance is invalid. |
aggregate_h3_frequencies(h3_points, *, h3_col='h3_cell', group_cols=None, user_col='user_id', leg_col='leg_id', trip_col='trip_id')
¶
Aggregate H3-indexed points into fréquentation counts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h3_points
|
DataFrame
|
Output of |
required |
h3_col
|
str
|
Column containing H3 cell identifiers. |
'h3_cell'
|
group_cols
|
Iterable[str] | None
|
Additional grouping columns. The H3 cell is always kept.
Add columns such as |
None
|
user_col
|
str
|
User identifier column used for |
'user_id'
|
leg_col
|
str
|
Leg identifier column used for |
'leg_id'
|
trip_col
|
str
|
Trip identifier column used for |
'trip_id'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with counts by H3 cell and optional grouping columns. |
build_h3_count_matrix(h3_points, *, h3_col='h3_cell', dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, metrics=DEFAULT_H3_COUNT_METRICS, user_col='user_id', leg_col='leg_id', trip_col='trip_id', fill_value=0)
¶
Build a wide H3 count table for dashboards.
The output keeps one row per H3 cell and creates explicit metric columns
for requested dimensions, for example
point_count__mode_niv1__marche or trip_count__time_slice__hpm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h3_points
|
DataFrame
|
H3-indexed points produced by |
required |
h3_col
|
str
|
H3 cell column. |
'h3_cell'
|
dimension_sets
|
Iterable[Iterable[str]]
|
Dimension sets used to create count columns. Each set is aggregated independently. Missing dimension columns are skipped. |
DEFAULT_H3_COUNT_DIMENSION_SETS
|
metrics
|
Iterable[str]
|
Count metrics to compute: |
DEFAULT_H3_COUNT_METRICS
|
user_col
|
str
|
User identifier column. |
'user_id'
|
leg_col
|
str
|
Leg identifier column. |
'leg_id'
|
trip_col
|
str
|
Trip identifier column. |
'trip_id'
|
fill_value
|
int
|
Value used for absent combinations. |
0
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Wide table keyed by |
Exports spatiaux dashboard¶
xyt_gps.spatial_exports
¶
Dashboard-oriented spatial table and DuckDB exports.
build_spatial_analytics_tables(dataset_or_legs, *, h3_resolution=9, config=None, frequency_group_cols=None, count_dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, count_metrics=DEFAULT_H3_COUNT_METRICS, include_count_matrix=True, time_slices=None, datetime_col='started_at', time_slice_col='time_slice', timezone=None, sample_distance_m=None)
¶
Build H3 spatial analytics tables without writing files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_or_legs
|
MobilityDataset | DataFrame
|
A |
required |
h3_resolution
|
int | Iterable[int]
|
H3 resolution or list of resolutions used for point indexing. |
9
|
config
|
ProjectConfig | None
|
Optional project configuration. When provided, its timezone and time slices are used unless explicit values are passed. |
None
|
frequency_group_cols
|
Iterable[str] | None
|
Grouping columns for |
None
|
count_dimension_sets
|
Iterable[Iterable[str]]
|
Dimension sets used for the wide count matrix. |
DEFAULT_H3_COUNT_DIMENSION_SETS
|
count_metrics
|
Iterable[str]
|
Count metrics included in the wide count matrix. |
DEFAULT_H3_COUNT_METRICS
|
include_count_matrix
|
bool
|
Whether to export |
True
|
time_slices
|
Iterable[TimeSlice] | None
|
Optional reusable daily intervals. Defaults to
|
None
|
datetime_col
|
str
|
Datetime column used to assign time slices. |
'started_at'
|
time_slice_col
|
str
|
Name of the time-slice column. |
'time_slice'
|
timezone
|
str | None
|
Local timezone used for time slices. Defaults to
|
None
|
sample_distance_m
|
float | None
|
Optional regular sampling distance in metres before H3 indexing. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
Dictionary with |
dict[str, DataFrame]
|
|
write_spatial_analytics_tables(tables, output_dir, *, formats=('parquet', 'csv'), overwrite=True, manifest_name='spatial_analytics_manifest.json')
¶
Write precomputed spatial analytics tables to disk.
Use this when tables have already been built with
build_spatial_analytics_tables() and need to be inspected before export.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
Mapping[str, DataFrame]
|
Mapping of table names to DataFrames. |
required |
output_dir
|
str | Path
|
Destination directory. |
required |
formats
|
Iterable[str]
|
Output formats: |
('parquet', 'csv')
|
overwrite
|
bool
|
If false, fail when a target file already exists. |
True
|
manifest_name
|
str
|
JSON manifest file name. |
'spatial_analytics_manifest.json'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Manifest with one row per written file. |
write_spatial_analytics_exports(dataset_or_legs, output_dir, *, h3_resolution=9, config=None, frequency_group_cols=None, count_dimension_sets=DEFAULT_H3_COUNT_DIMENSION_SETS, count_metrics=DEFAULT_H3_COUNT_METRICS, include_count_matrix=True, time_slices=None, datetime_col='started_at', time_slice_col='time_slice', timezone=None, sample_distance_m=None, formats=('parquet', 'csv'), overwrite=True)
¶
Write H3 point, frequency and count-matrix tables for dashboards.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_or_legs
|
MobilityDataset | DataFrame
|
A |
required |
output_dir
|
str | Path
|
Destination directory, for example
|
required |
h3_resolution
|
int | Iterable[int]
|
H3 resolution or list of resolutions used for point indexing. |
9
|
config
|
ProjectConfig | None
|
Optional project configuration. When provided, its timezone and time slices are used unless explicit values are passed. |
None
|
frequency_group_cols
|
Iterable[str] | None
|
Grouping columns for |
None
|
count_dimension_sets
|
Iterable[Iterable[str]]
|
Dimension sets used for the wide count matrix. |
DEFAULT_H3_COUNT_DIMENSION_SETS
|
count_metrics
|
Iterable[str]
|
Count metrics included in the wide count matrix. |
DEFAULT_H3_COUNT_METRICS
|
include_count_matrix
|
bool
|
Whether to export |
True
|
time_slices
|
Iterable[TimeSlice] | None
|
Optional reusable daily intervals. Defaults to
|
None
|
datetime_col
|
str
|
Datetime column used to assign time slices. |
'started_at'
|
time_slice_col
|
str
|
Name of the time-slice column. |
'time_slice'
|
timezone
|
str | None
|
Local timezone used for time slices. Defaults to
|
None
|
sample_distance_m
|
float | None
|
Optional regular sampling distance in metres before H3 indexing. |
None
|
formats
|
Iterable[str]
|
Output formats: |
('parquet', 'csv')
|
overwrite
|
bool
|
If false, fail when a target file already exists. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Manifest with one row per written file. |
write_duckdb_spatial_database(tables, database_path, *, overwrite=True, load_spatial=True, require_spatial_extension=False)
¶
Write mobility tables to a local DuckDB database.
GeoDataFrame geometries are stored as WKB columns so that the database can
be queried even without the spatial extension. When the extension is
available, companion views ending with _spatial are created with a DuckDB
geometry column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
Mapping[str, DataFrame] | MobilityDataset
|
Mapping of table names to DataFrames, or a |
required |
database_path
|
str | Path
|
Destination |
required |
overwrite
|
bool
|
If true, replace an existing database file. |
True
|
load_spatial
|
bool
|
Try to install and load DuckDB's spatial extension. |
True
|
require_spatial_extension
|
bool
|
If true, fail when the spatial extension cannot be loaded. Keep false for offline or lightweight use. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Manifest with table names, row counts and spatial-extension status. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If the optional |
FileExistsError
|
If the database exists and |
Indicateurs¶
xyt_gps.indicators
¶
Mobility indicator helpers computed from structured mobility tables.
build_person_day_indicators(dataset, *, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, config=None, default_phase_name='All')
¶
Build person-day mobility indicators by mode.
The MVP indicators are distance, travel time and number of trips. Values
are aggregated per user, date, analytical phase/period and mode. Distances
are expressed in kilometers, durations in minutes. If the dataset has no
Phase column, all rows are assigned to default_phase_name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
MobilityDataset
|
Transformed mobility dataset. |
required |
mode_col
|
str
|
Leg mode column, for example |
'mode_niv1'
|
trips_mode_col
|
str | None
|
Trip mode column. If absent, it is inferred as
|
None
|
distance_col
|
str | None
|
Optional leg distance column override. By default the
function tries |
None
|
include_zero_days
|
bool
|
When true, build a continuous user-day calendar
from |
True
|
include_excursions
|
bool
|
When false, exclude rows flagged with
|
True
|
include_airplane
|
bool
|
When false, exclude legs and trips whose source or mapped mode columns identify airplane travel. Airplane is excluded by default because it can dominate distances and CO2 indicators. |
False
|
config
|
ProjectConfig | None
|
Optional project configuration. If omitted, phase metadata
stored in |
None
|
default_phase_name
|
str
|
Analytical period name used when no phase split is present in the dataset. |
'All'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Long table with columns |
DataFrame
|
|
Raises:
| Type | Description |
|---|---|
KeyError
|
If required columns are missing. |
build_person_phase_indicators(person_day, *, user_stats=None, weight_col='weight')
¶
Average person-day indicators by user, phase and mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
person_day
|
DataFrame
|
Table produced by |
required |
user_stats
|
DataFrame | None
|
Optional user-level table used to attach weights. |
None
|
weight_col
|
str
|
User weight column. Missing weights are set to 1. |
'weight'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Long table with mean daily distance, travel time and trip count by |
DataFrame
|
user, phase and mode. If |
DataFrame
|
contains |
build_population_indicators(person_phase, *, use_weights=True, weight_col='weight')
¶
Average person-phase indicators at population level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
person_phase
|
DataFrame
|
Table produced by |
required |
use_weights
|
bool
|
When true and |
True
|
weight_col
|
str
|
Weight column attached to |
'weight'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Long table with one row per phase and mode. |
DataFrame
|
number of users contributing to each phase-mode mean. |
compute_mobility_indicators(dataset, *, mode_col='mode_niv1', trips_mode_col=None, distance_col=None, include_zero_days=True, include_excursions=True, include_airplane=False, use_weights=True, weight_col='weight', config=None, default_phase_name='All')
¶
Compute the first mobility indicator tables from a mobility dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
MobilityDataset
|
Transformed mobility dataset. |
required |
mode_col
|
str
|
Leg mode column used as indicator granularity. |
'mode_niv1'
|
trips_mode_col
|
str | None
|
Optional trip mode column override. |
None
|
distance_col
|
str | None
|
Optional leg distance column override. |
None
|
include_zero_days
|
bool
|
Whether to include tracked days without movement as zero rows. |
True
|
include_excursions
|
bool
|
Whether to include rows flagged as excursions in the indicator base. |
True
|
include_airplane
|
bool
|
Whether to include airplane legs and trips in the indicator base. Defaults to false because airplane rows can dominate distance, CO2 and demand profiles. |
False
|
use_weights
|
bool
|
Whether to use |
True
|
weight_col
|
str
|
User-level weight column. Missing weights are set to 1. |
'weight'
|
config
|
ProjectConfig | None
|
Optional project configuration. If omitted, phase metadata
stored in |
None
|
default_phase_name
|
str
|
Analytical period name used when no phase split is present in the dataset. |
'All'
|
Returns:
| Type | Description |
|---|---|
IndicatorResult
|
|
IndicatorResult
|
|
population_indicator_summary(indicators)
¶
Return a compact population-level indicator summary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indicators
|
IndicatorResult | DataFrame
|
|
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Population indicator table sorted by phase and mode. |
Enrichissements¶
xyt_gps.enrichment
¶
Optional enrichment helpers for mobility analysis tables.
CO2OccupancyConfig
dataclass
¶
Parameters used to derive CO2 and occupancy metrics from legs.
Factors are expressed in grams per kilometer. co2_g and
co2_direct_g are then computed as leg-level totals. Occupancy is
recomputed by default; set prefer_observed_occupancy=True to use a
positive provider value from occupancy_col when available.
HealthConfig
dataclass
¶
Parameters used to derive simple physical-activity metrics.
add_co2_occupancy_metrics(legs, *, journeys=None, map_track_trip_journey=None, config=None, mode_col='mode', distance_col=None, journey_purpose_col='main_purpose_mrmt', prefer_observed_occupancy=None, occupancy_col=None)
¶
Add occupancy and CO2 metrics to a leg table.
The function is intentionally row-level: it does not aggregate. This keeps assumptions visible before mobility indicators are computed.
By default, occupancy is recomputed from distance and purpose because
provider columns are often sparse. Pass prefer_observed_occupancy=True
to use a positive value from occupancy_col when available and fall back
to the computed value otherwise.
add_health_metrics(legs, *, config=None, mode_col='mode_niv1', distance_col=None, duration_col='duration')
¶
Add simple activity, intensity, MET and calorie metrics to legs.
build_leg_enrichment_tables(legs)
¶
Return compact CO2 and health side tables keyed by leg_id.
Mobility motifs¶
xyt_gps.motifs
¶
Daily mobility motif helpers.
The functions in this module migrate the useful core of the historical
GPStoGraph workflow without exposing notebook-era graph objects as the main
API. A motif is represented as a daily directed transition structure between
visited places. Places can come from an existing location_id column, or be
derived from staypoint purpose and rounded coordinates.
assign_mobility_motif_ids(motifs, *, signature_col='motif_signature', id_col='motif_id', top_n=9, other_motif_id=99)
¶
Assign stable numeric ids to the most frequent motif signatures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
motifs
|
DataFrame
|
Motif table returned by |
required |
signature_col
|
str
|
Column containing the canonical motif signature. |
'motif_signature'
|
id_col
|
str
|
Name of the generated id column. |
'motif_id'
|
top_n
|
int
|
Number of frequent motifs receiving ids from 1 to |
9
|
other_motif_id
|
int
|
Id used for less frequent motifs. |
99
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
build_mobility_motifs(staypoints_or_dataset, *, user_col='user_id', started_at_col='started_at', finished_at_col='finished_at', date_col=None, location_col=None, purpose_col='purpose_niv1', lon_col='lon', lat_col='lat', coordinate_precision=4, top_n_motifs=9, other_motif_id=99)
¶
Build daily mobility motifs from staypoints.
The function works on dataset.staypoints or on a staypoint DataFrame.
A motif is the daily sequence of places visited by one user after removing
consecutive duplicate places. The sequence is relabelled in order of first
appearance, then encoded as a flattened directed adjacency matrix. This
keeps the old motif_flat idea while making the result easy to export and
compare.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
staypoints_or_dataset
|
MobilityDataset | DataFrame
|
|
required |
user_col
|
str
|
User identifier column. |
'user_id'
|
started_at_col
|
str
|
Start timestamp column used for sorting. |
'started_at'
|
finished_at_col
|
str
|
Optional end timestamp column kept in motif nodes. |
'finished_at'
|
date_col
|
str | None
|
Optional explicit date column. If omitted, the date is
derived from |
None
|
location_col
|
str | None
|
Optional stable place identifier. If omitted, a place key is derived from purpose and rounded coordinates. |
None
|
purpose_col
|
str | None
|
Optional activity/purpose label used in derived place keys. |
'purpose_niv1'
|
lon_col
|
str
|
Longitude column used when deriving place keys. |
'lon'
|
lat_col
|
str
|
Latitude column used when deriving place keys. |
'lat'
|
coordinate_precision
|
int
|
Decimal precision for coordinate-derived keys. |
4
|
top_n_motifs
|
int
|
Number of frequent motif signatures assigned ids 1..N. |
9
|
other_motif_id
|
int
|
Id assigned to less frequent motifs. |
99
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per user-day with the canonical sequence, adjacency signature, |
DataFrame
|
motif id and simple counts. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If required columns are missing. |
ValueError
|
If no usable staypoint row remains. |
summarize_mobility_motifs(motifs, *, motif_id_col='motif_id', signature_col='motif_signature')
¶
Summarize motif frequencies and simple structural properties.
build_mobility_motif_sequences(motifs, *, user_col='user_id', date_col='date', motif_id_col='motif_id', n_days=60, align_to_week=True, fill_value=0)
¶
Build a fixed-width daily motif sequence for each user.
This is the package equivalent of the historical motif_sequence()
helper. Each row is a user and each column is a relative day. Missing days
are filled with fill_value. When align_to_week=True, the first observed
motif is shifted so the first column corresponds to Monday.
Export¶
xyt_gps.export
¶
Export helpers for structured mobility tables.
The export layer is kept separate from transformation code so formats such as CSV, Parquet or Excel remain optional package concerns.
mobility_dataset_tables(dataset)
¶
Return the named tables contained in a MobilityDataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
MobilityDataset
|
Transformed mobility dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
Dictionary keyed by stable table names. The order follows the |
dict[str, DataFrame]
|
transformation workflow and is reused by export helpers. |
write_mobility_dataset(dataset, output_dir, *, formats=('csv', 'geojson'), extra_tables=None, include_validation=True, include_quality_reports=True, selection_table=None, overwrite=True)
¶
Write intermediate MobilityDataset tables to disk.
The function writes the inspectable states of the preparation workflow: storyline, legs, staypoints, trips, journeys, user stats, mapping tables and optional validation and quality reports. It returns a manifest so the caller can see exactly what was exported.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
MobilityDataset
|
Transformed mobility dataset. |
required |
output_dir
|
str | Path
|
Destination directory. |
required |
formats
|
Iterable[str]
|
Export formats. Supported values are |
('csv', 'geojson')
|
extra_tables
|
Mapping[str, DataFrame] | None
|
Optional additional tables to export with the same
formats, for example |
None
|
include_validation
|
bool
|
Whether to export raw-schema validation issues. |
True
|
include_quality_reports
|
bool
|
Whether to export tracking-quality summary and optional user selection table. |
True
|
selection_table
|
DataFrame | None
|
Optional table produced by
|
None
|
overwrite
|
bool
|
If false, fail when a target file already exists. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A manifest with one row per written file: table, format, path and |
DataFrame
|
number of rows. |
Raises:
| Type | Description |
|---|---|
FileExistsError
|
If |
ValueError
|
If an unsupported format is requested. |
ImportError
|
If optional dependencies for Parquet or Excel are missing. |
export_mobility_tables(*args, **kwargs)
¶
Alias for write_mobility_dataset.
write_mobility_dataset is the preferred name because it says that the
full structured dataset is written, including reports and mapping tables.
write_indicator_result(indicators, output_dir, *, formats=('parquet', 'csv'), overwrite=True)
¶
Write mobility indicator tables to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indicators
|
IndicatorResult
|
Result returned by |
required |
output_dir
|
str | Path
|
Destination directory. |
required |
formats
|
Iterable[str]
|
Export formats. Supported values are |
('parquet', 'csv')
|
overwrite
|
bool
|
If false, fail when a target file already exists. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Manifest with one row per written file. |
Visualisation cartes¶
xyt_gps.viz_maps
¶
Interactive map visualizations for GPS traces and H3 cells.
plot_h3_frequency_map(h3_frequency, *, h3_col='h3_cell', value_col='point_count', tooltip_cols=None, aggregate_cells=True, max_cells=2500, palette=('#f4f1f8', '#d9cce9', '#b79bd5', '#8f67bd', '#6b4c9a', '#3d275f'), fill_opacity=0.62, line_opacity=0.25, tiles='cartodbpositron', zoom_start=11, map_center=None, save_path=None)
¶
Plot H3 fréquentation cells on an interactive Folium map.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h3_frequency
|
DataFrame
|
Table produced by |
required |
h3_col
|
str
|
Column containing H3 cell identifiers. |
'h3_cell'
|
value_col
|
str
|
Numeric column used to color cells. |
'point_count'
|
tooltip_cols
|
Iterable[str] | None
|
Columns shown in the popup. When omitted, common count columns are used when present. |
None
|
aggregate_cells
|
bool
|
If true, aggregate rows by H3 cell before plotting. This is useful when the input is split by mode, phase or project. |
True
|
max_cells
|
int | None
|
Optional maximum number of cells to draw. The most frequent
cells are kept. Set to |
2500
|
palette
|
tuple[str, ...]
|
Sequential color palette from low to high values. |
('#f4f1f8', '#d9cce9', '#b79bd5', '#8f67bd', '#6b4c9a', '#3d275f')
|
fill_opacity
|
float
|
Polygon fill opacity. |
0.62
|
line_opacity
|
float
|
Polygon border opacity. |
0.25
|
tiles
|
str
|
Folium base map. |
'cartodbpositron'
|
zoom_start
|
int
|
Initial zoom level. |
11
|
map_center
|
tuple[float, float] | None
|
Optional |
None
|
save_path
|
str | Path | None
|
Optional HTML output path. |
None
|
Returns:
| Type | Description |
|---|---|
object
|
A |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
KeyError
|
If required columns are missing. |
ValueError
|
If no valid H3 cell can be plotted. |
plot_gps_traces(dataset_or_legs, *, staypoints=None, user_ids=None, sample_n=None, random_state=42, color_by='mode_niv1', geometry_col='geometry', show_staypoints=True, use_antpath=True, tiles='cartodbpositron', zoom_start=12, map_center=None, save_path=None)
¶
Plot GPS legs and optional staypoints on an interactive Folium map.
The function is designed for notebook checks. It keeps the API small:
pass a MobilityDataset after transformation, or pass a leg
GeoDataFrame directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_or_legs
|
MobilityDataset | GeoDataFrame
|
|
required |
staypoints
|
GeoDataFrame | None
|
Optional staypoint GeoDataFrame when passing legs directly. |
None
|
user_ids
|
Iterable | None
|
Optional users to display. |
None
|
sample_n
|
int | None
|
Optional number of legs to sample before plotting. |
None
|
random_state
|
int | None
|
Random seed used for leg sampling. |
42
|
color_by
|
str | None
|
Column used to color legs. Set to |
'mode_niv1'
|
geometry_col
|
str
|
Geometry column name. |
'geometry'
|
show_staypoints
|
bool
|
Whether to draw staypoints as circles. |
True
|
use_antpath
|
bool
|
Whether to animate legs with Folium |
True
|
tiles
|
str
|
Folium base map. |
'cartodbpositron'
|
zoom_start
|
int
|
Initial zoom level. |
12
|
map_center
|
tuple[float, float] | None
|
Optional |
None
|
save_path
|
str | Path | None
|
Optional HTML output path. |
None
|
Returns:
| Type | Description |
|---|---|
object
|
A |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
KeyError
|
If the geometry column is missing. |
ValueError
|
If no leg geometry is available for plotting. |
Visualisation indicateurs¶
xyt_gps.viz_indicators
¶
Notebook-friendly indicator charts.
plot_indicator_bars(indicators, *, table='population', metrics=None, group_col='mode', facet_col='Phase', title='Indicateurs de mobilité', max_bars=None, group_order=None, sort_bars_by_value=False, value_format='{:.1f}', bar_color='#6b4c9a', include_all_modes=True, all_modes_label='Tous modes', all_modes_color=ALL_MODES_COLOR, show_demand_profile=True, demand_profile_max_modes=6, metadata=None, show_identity_card=True, save_path=None)
¶
Render mobility indicators as simple notebook bar charts.
The default input is an IndicatorResult returned by
compute_mobility_indicators(). The function reads its population table and
plots the main per-day metrics by mode. It can also receive a DataFrame
directly, for example indicators.person_phase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indicators
|
IndicatorResult | DataFrame
|
|
required |
table
|
str
|
Indicator table to use when |
'population'
|
metrics
|
Iterable[str] | None
|
Numeric columns to plot. When omitted, common indicator columns are inferred. |
None
|
group_col
|
str
|
Categorical column used for bars, usually |
'mode'
|
facet_col
|
str | None
|
Optional column used to create one panel per period/phase. |
'Phase'
|
title
|
str
|
Displayed title. |
'Indicateurs de mobilité'
|
max_bars
|
int | None
|
Optional maximum number of bars per panel. |
None
|
group_order
|
Iterable[str] | None
|
Optional explicit order for the categorical bars. When omitted, common mobility modes use a stable default order so that phases remain visually comparable. |
None
|
sort_bars_by_value
|
bool
|
When true, sort bars by decreasing value inside each panel. This reproduces the older compact ranking behavior, but can make modes change position between phases. |
False
|
value_format
|
str
|
Python format string used for numeric labels. |
'{:.1f}'
|
bar_color
|
str
|
CSS color for bars. |
'#6b4c9a'
|
include_all_modes
|
bool
|
Whether to add a total row across all displayed modes in each phase/panel. |
True
|
all_modes_label
|
str
|
Label used for the total row. |
'Tous modes'
|
all_modes_color
|
str
|
CSS color used for the total row. |
ALL_MODES_COLOR
|
show_demand_profile
|
bool
|
Whether to display 5-minute daily demand curves
when |
True
|
demand_profile_max_modes
|
int
|
Maximum number of demand curves per phase, including the all-modes curve. |
6
|
metadata
|
Mapping[str, object] | None
|
Optional metadata displayed in the identity card. Values
override |
None
|
show_identity_card
|
bool
|
Whether to display calculation metadata before the bars. |
True
|
save_path
|
str | Path | None
|
Optional HTML file path. |
None
|
Returns:
| Type | Description |
|---|---|
object
|
An |
object
|
the raw HTML string. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If required columns are missing. |
ValueError
|
If no metric can be plotted. |
Visualisation participation¶
xyt_gps.viz_participation
¶
Notebook-friendly participation heatmaps.
plot_participation_heatmap(participation_grid, *, user_col='user_id', week_col='protocol_week_number', score_col='active_days_count', phase_col='Phase', title='Participation hebdomadaire', max_score=7, max_users=None, cell_size=13, cell_gap=3, show_phase_separators=True, phase_separator_color='#e83b46', phase_separator_width=3, save_path=None)
¶
Render a GitHub-style participation heatmap in a notebook.
The input is the long table produced by build_weekly_participation_grid.
Rows are users, columns are protocol weeks, and cell color intensity is
based on score_col, usually the number of active tracked days from 0 to 7.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participation_grid
|
DataFrame
|
Weekly participation table. |
required |
user_col
|
str
|
User identifier column. |
'user_id'
|
week_col
|
str
|
Week number column. |
'protocol_week_number'
|
score_col
|
str
|
Participation score column. |
'active_days_count'
|
phase_col
|
str
|
Optional phase/period column used in cell tooltips. |
'Phase'
|
title
|
str
|
Displayed title. |
'Participation hebdomadaire'
|
max_score
|
int
|
Maximum score used for the color scale. |
7
|
max_users
|
int | None
|
Optional maximum number of users displayed. |
None
|
cell_size
|
int
|
Square size in pixels. |
13
|
cell_gap
|
int
|
Gap between squares in pixels. |
3
|
show_phase_separators
|
bool
|
Whether to add a red separator when the phase changes between two consecutive weeks. |
True
|
phase_separator_color
|
str
|
CSS color used for phase separators. |
'#e83b46'
|
phase_separator_width
|
int
|
Separator column width in pixels. |
3
|
save_path
|
str | Path | None
|
Optional HTML file path. |
None
|
Returns:
| Type | Description |
|---|---|
object
|
An |
object
|
the raw HTML string. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If required columns are missing. |
ValueError
|
If the participation grid is empty. |
Mappings¶
xyt_gps.mappings
¶
Mode and purpose mapping configuration.
MobilityMappings
dataclass
¶
Project-level mode and purpose mappings used by transformations.
mode_purpose_mapping(**kwargs)
¶
Build mode and purpose mappings for a project.
This is a readable factory around MobilityMappings. Pass the same keyword
arguments as the dataclass when a project needs custom groupings.
Example
mode_purpose_mapping(storyline_mode_niv1={"Marche": ("Mode::Walk",)})
map_value(value, mapping, *, default='Autres')
¶
Map one provider value to a project category.
map_sequence(value, mapping, *, default='Autres')
¶
Map a provider sequence such as Mode::Walk + Mode::Bus category by category.