Class StringExtractor
java.lang.Object
org.htmlparser.parserapplications.StringExtractor
Extract plaintext strings from a web page.
Illustrative program to gather the textual contents of a web page.
Uses a
StringBean
to accumulate
the user visible text (what a browser would display) into a single string.-
Constructor Summary
ConstructorsConstructorDescriptionStringExtractor
(String resource) Construct a StringExtractor to read from the given resource. -
Method Summary
Modifier and TypeMethodDescriptionextractStrings
(boolean links) Extract the text from a page.static void
Mainline.
-
Constructor Details
-
StringExtractor
Construct a StringExtractor to read from the given resource.- Parameters:
resource
- Either a URL or a file name.
-
-
Method Details
-
extractStrings
Extract the text from a page.- Parameters:
links
- iftrue
include hyperlinks in output.- Returns:
- The textual contents of the page.
- Throws:
ParserException
- If a parse error occurs.
-
main
Mainline.- Parameters:
args
- The command line arguments.
-