pymdb.parser module

Module containing the PyMDbParser class.

PyMDbParser

class pymdb.parser.PyMDbParser(use_default_filenames=True, gunzip_files=False, delete_gzip_files=False)

Object used to parse the tsv datasets provided by IMDb.

Parses each row in the tsv file into a specific PyMDb object.

Parameters:
  • use_default_filenames (bool, optional) – Determine whether the filenames for each dataset are the same as the names provided by IMDb.
  • gunzip_files (bool, optional) – Determine if the files are gzipped or not.
  • delete_gzip_files (bool, optional) – Determine if gzip files should be deleted after being gunzipped.
_build_path(path, default_filename)

Private function to combine a system path with a default filename.

This method will append the default filename of a dataset to the given path it is located in. If the files are to be gunzipped, it will also append the correct gzip extension used by IMDb.

Parameters:
  • path (str) – The system path to the directory where the dataset is located.
  • default_filename (str) – The default filename of the dataset.
Returns:

The path and default filename combined correctly.

Return type:

str

get_name_basics(path, contains_headers=True)

Parse the “name.basics.tsv” dataset provided by IMDb.

Parameters:
  • path (str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file.
  • contains_headers (bool, optional) – Determine if the first line is column titles or a data row.
Yields:

A NameBasics object for each row in the dataset.

Raises:

InvalidParseFormat – If a row has an incorrect column size.

get_title_akas(path, contains_headers=True)

Parse the “title.akas.tsv” dataset provided by IMDb.

Parameters:
  • path (str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file.
  • contains_headers (bool, optional) – Determine if the first line is column titles or a data row.
Yields:

A TitleAkas object for each row in the dataset.

Raises:

InvalidParseFormat – If a row has an incorrect column size.

get_title_basics(path, contains_headers=True)

Parse the “title.basics.tsv” dataset provided by IMDb.

Parameters:
  • path (str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file.
  • contains_headers (bool, optional) – Determine if the first line is column titles or a data row.
Yields:

A TitleBasics object for each row in the dataset.

Raises:

InvalidParseFormat – If a row has an incorrect column size.

get_title_crew(path, contains_headers=True)

Parse the “title.crew.tsv” dataset provided by IMDb.

Parameters:
  • path (str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file.
  • contains_headers (bool, optional) – Determine if the first line is column titles or a data row.
Yields:

A TitleCrew object for each row in the dataset.

Raises:

InvalidParseFormat – If a row has an incorrect column size.

get_title_episodes(path, contains_headers=True)

Parse the “title.episodes.tsv” dataset provided by IMDb.

Parameters:
  • path (str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file.
  • contains_headers (bool, optional) – Determine if the first line is column titles or a data row.
Yields:

A TitleEpisode object for each row in the dataset.

Raises:

InvalidParseFormat – If a row has an incorrect column size.

get_title_principals(path, contains_headers=True)

Parse the “title.principals.tsv” dataset provided by IMDb.

Parameters:
  • path (str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file.
  • contains_headers (bool, optional) – Determine if the first line is column titles or a data row.
Yields:

A TitlePrincipalCrew object for each row in the dataset.

Raises:

InvalidParseFormat – If a row has an incorrect column size.

get_title_ratings(path, contains_headers=True)

Parse the “title.ratings.tsv” dataset provided by IMDb.

Parameters:
  • path (str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file.
  • contains_headers (bool, optional) – Determine if the first line is column titles or a data row.
Yields:

A TitleRating object for each row in the dataset.

Raises:

InvalidParseFormat – If a row has an incorrect column size.