Class StringExtractor

java.lang.Object
org.htmlparser.parserapplications.StringExtractor

public class StringExtractor extends Object
Extract plaintext strings from a web page. Illustrative program to gather the textual contents of a web page. Uses a StringBean to accumulate the user visible text (what a browser would display) into a single string.
  • Constructor Details

    • StringExtractor

      public StringExtractor(String resource)
      Construct a StringExtractor to read from the given resource.
      Parameters:
      resource - Either a URL or a file name.
  • Method Details

    • extractStrings

      public String extractStrings(boolean links) throws ParserException
      Extract the text from a page.
      Parameters:
      links - if true include hyperlinks in output.
      Returns:
      The textual contents of the page.
      Throws:
      ParserException - If a parse error occurs.
    • main

      public static void main(String[] args)
      Mainline.
      Parameters:
      args - The command line arguments.