pymdb.parser module¶
Module containing the PyMDbParser class.
PyMDbParser¶
-
class
pymdb.parser.PyMDbParser(use_default_filenames=True, gunzip_files=False, delete_gzip_files=False)¶ Object used to parse the tsv datasets provided by IMDb.
Parses each row in the tsv file into a specific PyMDb object.
Parameters: - use_default_filenames (
bool, optional) – Determine whether the filenames for each dataset are the same as the names provided by IMDb. - gunzip_files (
bool, optional) – Determine if the files are gzipped or not. - delete_gzip_files (
bool, optional) – Determine if gzip files should be deleted after being gunzipped.
-
_build_path(path, default_filename)¶ Private function to combine a system path with a default filename.
This method will append the default filename of a dataset to the given path it is located in. If the files are to be gunzipped, it will also append the correct gzip extension used by IMDb.
Parameters: - path (
str) – The system path to the directory where the dataset is located. - default_filename (
str) – The default filename of the dataset.
Returns: The path and default filename combined correctly.
Return type: str- path (
-
get_name_basics(path, contains_headers=True)¶ Parse the “name.basics.tsv” dataset provided by IMDb.
Parameters: - path (
str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool, optional) – Determine if the first line is column titles or a data row.
Yields: A
NameBasicsobject for each row in the dataset.Raises: InvalidParseFormat– If a row has an incorrect column size.- path (
-
get_title_akas(path, contains_headers=True)¶ Parse the “title.akas.tsv” dataset provided by IMDb.
Parameters: - path (
str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleAkasobject for each row in the dataset.Raises: InvalidParseFormat– If a row has an incorrect column size.- path (
-
get_title_basics(path, contains_headers=True)¶ Parse the “title.basics.tsv” dataset provided by IMDb.
Parameters: - path (
str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleBasicsobject for each row in the dataset.Raises: InvalidParseFormat– If a row has an incorrect column size.- path (
-
get_title_crew(path, contains_headers=True)¶ Parse the “title.crew.tsv” dataset provided by IMDb.
Parameters: - path (
str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleCrewobject for each row in the dataset.Raises: InvalidParseFormat– If a row has an incorrect column size.- path (
-
get_title_episodes(path, contains_headers=True)¶ Parse the “title.episodes.tsv” dataset provided by IMDb.
Parameters: - path (
str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleEpisodeobject for each row in the dataset.Raises: InvalidParseFormat– If a row has an incorrect column size.- path (
-
get_title_principals(path, contains_headers=True)¶ Parse the “title.principals.tsv” dataset provided by IMDb.
Parameters: - path (
str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitlePrincipalCrewobject for each row in the dataset.Raises: InvalidParseFormat– If a row has an incorrect column size.- path (
-
get_title_ratings(path, contains_headers=True)¶ Parse the “title.ratings.tsv” dataset provided by IMDb.
Parameters: - path (
str) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleRatingobject for each row in the dataset.Raises: InvalidParseFormat– If a row has an incorrect column size.- path (
- use_default_filenames (