Package textprocessing
Class Driver
java.lang.Object
textprocessing.Driver
A Driver class for processing text from a file.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static void
addBigrams
(List<Word> bigrams, List<BasicWord> words) A helper method that generates Bigrams from the ordered List of BasicWords and stores the Bigrams in a List.private static void
addVocabulary
(List<Word> vocabulary, List<BasicWord> words) A helper method that generates VocabularyEntry objects from the ordered List of BasicWords and stores the entries in a List.private static void
A helper method that reads words from a text file one at a time and stores the normalized words in a List of BasicWords.private static String
A helper method to get input from the user.static void
private static String
A helper method that removes all punctuation from a String and converts the resulting punctuation-less String to lowercaseprivate static void
removeHeader
(Scanner read) A helper method to remove the header information from a Project Gutenberg file.private static void
A helper method that generates a report of the most frequent entries from the given sorted List.private static void
A helper method to save a List of Words as a text file
-
Field Details
-
DATA_FOLDER
- See Also:
-
-
Constructor Details
-
Driver
public Driver()
-
-
Method Details
-
main
-
getInput
A helper method to get input from the user.- Parameters:
in
- Scanner using System.in as inputmessage
- the message with which to prompt the user- Returns:
- the user input
-
removeHeader
A helper method to remove the header information from a Project Gutenberg file. The method will continue to consume the buffer of the Scanner until the header text has been removed, then will stop.- Parameters:
read
- Scanner using a Project Gutenberg text file as input
-
addWords
A helper method that reads words from a text file one at a time and stores the normalized words in a List of BasicWords. Any word that contains only whitespace should be ignored.- Parameters:
words
- the List used to store the wordsread
- Scanner using a Project Gutenberg text file as input
-
normalize
A helper method that removes all punctuation from a String and converts the resulting punctuation-less String to lowercase- Parameters:
s
- the String to normalize- Returns:
- the normalized String
-
addBigrams
A helper method that generates Bigrams from the ordered List of BasicWords and stores the Bigrams in a List. There should only be one instance of each Bigram in the List. When successive copies of the same Bigram are found, the location should be added to the existing Bigram and the occurrence count should be incremented.- Parameters:
bigrams
- the List in which to store the resulting Bigramswords
- the ordered List of BasicWord to use to generate the Bigrams
-
addVocabulary
A helper method that generates VocabularyEntry objects from the ordered List of BasicWords and stores the entries in a List. There should only be one instance of each VocabularyEntry in the List. When successive copies of the same entry are found, the location should be added to the existing entry and the occurrence count should be incremented.- Parameters:
vocabulary
- the List in which to store the resulting VocabularyEntry objectswords
- the ordered List of BasicWord to use to generate the vocabulary
-
saveFile
A helper method to save a List of Words as a text file- Parameters:
list
- the List of Word objects to saveoutput
- the File to save the data into- Throws:
FileNotFoundException
- thrown if the File cannot be found
-
report
A helper method that generates a report of the most frequent entries from the given sorted List. If topHits is greater than the total number of entries in the list, it will print out the entire list- Parameters:
list
- the List from which to generate the reporttype
- a String describing what the contents of the list are (i.e. "Words", "Bigrams", etc)topHits
- the number of items to display in the report
-