Getting started#
If you are new to Perke, this is the place to begin. Currently, Perke implements graph-based models. Let’s see a longer example to learn how to use these models.
Example
from pathlib import Path
from perke.unsupervised.graph_based import SingleRank
# Define the set of valid part of speech tags to occur in the model.
valid_pos_tags = {'NOUN', 'ADJ'}
# 1. Create a SingleRank extractor.
extractor = SingleRank(valid_pos_tags=valid_pos_tags)
# 2. Load the text.
input_filepath = Path(__file__).parent.parent.parent / 'input.txt'
extractor.load_text(input=input_filepath, word_normalization_method=None)
# 3. Select the longest sequences of nouns and adjectives as
# candidates.
extractor.select_candidates()
# 4. Weight the candidates using the sum of their words weights that
# are computed using random walk. In the graph, nodes are certain
# parts of speech (nouns and adjectives) that are connected if
# they co-occur in a window of 10 words.
extractor.weight_candidates(window=10)
# 5. Get the 10 highest weighted candidates as keyphrases
keyphrases = extractor.get_n_best(n=10)
for i, (weight, keyphrase) in enumerate(keyphrases):
print(f'{i+1}.\t{keyphrase}, \t{weight}')
For other models, see the examples directory.