Modules¶
Origin-Destination Matrix¶
- class distributed_trajectories.OD.OD(df)[source]¶
calculates Origin-Destination matrix for the given PySpark Data Frame
- collect_OD_updates()[source]¶
sums up the contributions for different Origin-Destination pairs :return:
Transition Matrix¶
- class distributed_trajectories.TM.TM(df)[source]¶
creates Transition Matrix
- collect_TM_updates()[source]¶
sums up the contributions for different Origin-Destination pairs
- Returns
PySpark DF
- static normalize_tm(tm)[source]¶
normalizing TM, so the sum over each row is 1
- Returns
Transition Matrix
- set_TM_updates()[source]¶
for each pair of (origin, destination) create entry for the OD matrix
- Returns
Transition Matrix object
User Defined Functions¶
- distributed_trajectories.udfs.check_central_position(m, n, i, j, width)[source]¶
checking if the distribution given by the i,j and width will be inside the grid defined by the n and m.
- \(i\in[1..m]\), \(j\in[1..n]\), where m – number of cells along latitude and
n is the number of cells along longitude.
- Parameters
m – number of cells along latitude
n – number of cells along longitude
i – current position along latitude
j – current position along longitude
width – width of the distribution of state
- Returns
True or False
- distributed_trajectories.udfs.d1_coords_pure(d2_c, m)[source]¶
transform 2D to 1D coords without values
- distributed_trajectories.udfs.d1_state_vector(i, j, width, m, n)[source]¶
given i and j positions on the lattice and the width of distribution, produces a 1D state vector representation of mxn length.
- distributed_trajectories.udfs.d2_coords(i, j, width=1)[source]¶
given the (i,j) get the coords of the neighbouring cells within width
- distributed_trajectories.udfs.filter_OD(origins, destinations)[source]¶
takes lists of origins and destinations in (1D notation) and returns list of tuples with OD coorinates
- distributed_trajectories.udfs.middle_interval_for_x(x, A, B, m)[source]¶
given the borders [A,B] and the number of intervals m, calculates the new x as the coordinate of the middle of the interval x belongs to.
Main module¶
- class distributed_trajectories.distributed_trajectories.PrepareDataset(path)[source]¶
reading, filtering and preparing Dataset
- filter_too_sparse_IDs()[source]¶
keeping just those tracks which have > timestamps_per_hour points per hour on average
- Returns
filtered self.df
- remove_too_fast_objects()[source]¶
some data entries are surely erroneus, so some objects move up to 10-20 km per second. We should remove it.
- Returns
filtered self.df