pymdb.parser module¶
Module containing the PyMDbParser class.
PyMDbParser¶
-
class
pymdb.parser.
PyMDbParser
(use_default_filenames=True, gunzip_files=False, delete_gzip_files=False)¶ Object used to parse the tsv datasets provided by IMDb.
Parses each row in the tsv file into a specific PyMDb object.
Parameters: - use_default_filenames (
bool
, optional) – Determine whether the filenames for each dataset are the same as the names provided by IMDb. - gunzip_files (
bool
, optional) – Determine if the files are gzipped or not. - delete_gzip_files (
bool
, optional) – Determine if gzip files should be deleted after being gunzipped.
-
_build_path
(path, default_filename)¶ Private function to combine a system path with a default filename.
This method will append the default filename of a dataset to the given path it is located in. If the files are to be gunzipped, it will also append the correct gzip extension used by IMDb.
Parameters: - path (
str
) – The system path to the directory where the dataset is located. - default_filename (
str
) – The default filename of the dataset.
Returns: The path and default filename combined correctly.
Return type: str
- path (
-
get_name_basics
(path, contains_headers=True)¶ Parse the “name.basics.tsv” dataset provided by IMDb.
Parameters: - path (
str
) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool
, optional) – Determine if the first line is column titles or a data row.
Yields: A
NameBasics
object for each row in the dataset.Raises: InvalidParseFormat
– If a row has an incorrect column size.- path (
-
get_title_akas
(path, contains_headers=True)¶ Parse the “title.akas.tsv” dataset provided by IMDb.
Parameters: - path (
str
) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool
, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleAkas
object for each row in the dataset.Raises: InvalidParseFormat
– If a row has an incorrect column size.- path (
-
get_title_basics
(path, contains_headers=True)¶ Parse the “title.basics.tsv” dataset provided by IMDb.
Parameters: - path (
str
) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool
, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleBasics
object for each row in the dataset.Raises: InvalidParseFormat
– If a row has an incorrect column size.- path (
-
get_title_crew
(path, contains_headers=True)¶ Parse the “title.crew.tsv” dataset provided by IMDb.
Parameters: - path (
str
) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool
, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleCrew
object for each row in the dataset.Raises: InvalidParseFormat
– If a row has an incorrect column size.- path (
-
get_title_episodes
(path, contains_headers=True)¶ Parse the “title.episodes.tsv” dataset provided by IMDb.
Parameters: - path (
str
) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool
, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleEpisode
object for each row in the dataset.Raises: InvalidParseFormat
– If a row has an incorrect column size.- path (
-
get_title_principals
(path, contains_headers=True)¶ Parse the “title.principals.tsv” dataset provided by IMDb.
Parameters: - path (
str
) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool
, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitlePrincipalCrew
object for each row in the dataset.Raises: InvalidParseFormat
– If a row has an incorrect column size.- path (
-
get_title_ratings
(path, contains_headers=True)¶ Parse the “title.ratings.tsv” dataset provided by IMDb.
Parameters: - path (
str
) – The system path to the dataset file. If not using default filenames, this string will include the dataset file. - contains_headers (
bool
, optional) – Determine if the first line is column titles or a data row.
Yields: A
TitleRating
object for each row in the dataset.Raises: InvalidParseFormat
– If a row has an incorrect column size.- path (
- use_default_filenames (