pymdb.parser module¶

Module containing the PyMDbParser class.

PyMDbParser¶

class pymdb.parser.PyMDbParser(use_default_filenames=True, gunzip_files=False, delete_gzip_files=False)¶

Object used to parse the tsv datasets provided by IMDb.

Parses each row in the tsv file into a specific PyMDb object.

Parameters:	use_default_filenames (`bool`, optional) – Determine whether the filenames for each dataset are the same as the names provided by IMDb. gunzip_files (`bool`, optional) – Determine if the files are gzipped or not. delete_gzip_files (`bool`, optional) – Determine if gzip files should be deleted after being gunzipped.

_build_path(path, default_filename)¶

Private function to combine a system path with a default filename.

This method will append the default filename of a dataset to the given path it is located in. If the files are to be gunzipped, it will also append the correct gzip extension used by IMDb.

Parameters:	path (`str`) – The system path to the directory where the dataset is located. default_filename (`str`) – The default filename of the dataset.
Returns:	The path and default filename combined correctly.
Return type:	`str`

get_name_basics(path, contains_headers=True)¶

Parse the “name.basics.tsv” dataset provided by IMDb.

Parameters:	path (`str`) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. contains_headers (`bool`, optional) – Determine if the first line is column titles or a data row.
Yields:	A `NameBasics` object for each row in the dataset.
Raises:	`InvalidParseFormat` – If a row has an incorrect column size.

get_title_akas(path, contains_headers=True)¶

Parse the “title.akas.tsv” dataset provided by IMDb.

Parameters:	path (`str`) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. contains_headers (`bool`, optional) – Determine if the first line is column titles or a data row.
Yields:	A `TitleAkas` object for each row in the dataset.
Raises:	`InvalidParseFormat` – If a row has an incorrect column size.

get_title_basics(path, contains_headers=True)¶

Parse the “title.basics.tsv” dataset provided by IMDb.

Parameters:	path (`str`) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. contains_headers (`bool`, optional) – Determine if the first line is column titles or a data row.
Yields:	A `TitleBasics` object for each row in the dataset.
Raises:	`InvalidParseFormat` – If a row has an incorrect column size.

get_title_crew(path, contains_headers=True)¶

Parse the “title.crew.tsv” dataset provided by IMDb.

Parameters:	path (`str`) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. contains_headers (`bool`, optional) – Determine if the first line is column titles or a data row.
Yields:	A `TitleCrew` object for each row in the dataset.
Raises:	`InvalidParseFormat` – If a row has an incorrect column size.

get_title_episodes(path, contains_headers=True)¶

Parse the “title.episodes.tsv” dataset provided by IMDb.

Parameters:	path (`str`) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. contains_headers (`bool`, optional) – Determine if the first line is column titles or a data row.
Yields:	A `TitleEpisode` object for each row in the dataset.
Raises:	`InvalidParseFormat` – If a row has an incorrect column size.

get_title_principals(path, contains_headers=True)¶

Parse the “title.principals.tsv” dataset provided by IMDb.

Parameters:	path (`str`) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. contains_headers (`bool`, optional) – Determine if the first line is column titles or a data row.
Yields:	A `TitlePrincipalCrew` object for each row in the dataset.
Raises:	`InvalidParseFormat` – If a row has an incorrect column size.

get_title_ratings(path, contains_headers=True)¶

Parse the “title.ratings.tsv” dataset provided by IMDb.

Parameters:	path (`str`) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. contains_headers (`bool`, optional) – Determine if the first line is column titles or a data row.
Yields:	A `TitleRating` object for each row in the dataset.
Raises:	`InvalidParseFormat` – If a row has an incorrect column size.