Class InputStreamSource

java.lang.Object
java.io.Reader
org.htmlparser.lexer.Source
org.htmlparser.lexer.InputStreamSource
All Implemented Interfaces:
Closeable, Serializable, AutoCloseable, Readable

public class InputStreamSource extends Source
A source of characters based on an InputStream such as from a URLConnection.
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static int
    An initial buffer size.
    protected char[]
    The characters read so far.
    protected String
    The character set in use.
    protected int
    The number of valid bytes in the buffer.
    protected int
    The bookmark.
    protected int
    The offset of the next byte returned by read().
    The converter from bytes to characters.
    protected InputStream
    The stream of bytes.

    Fields inherited from class org.htmlparser.lexer.Source

    EOF

    Fields inherited from class java.io.Reader

    lock
  • Constructor Summary

    Constructors
    Constructor
    Description
    Create a source of characters using the default character set.
    Create a source of characters.
    InputStreamSource(InputStream stream, String charset, int size)
    Create a source of characters.
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    Get the number of available characters.
    void
    Does nothing.
    void
    Close the source.
    protected void
    fill(int min)
    Fetch more characters from the underlying reader.
    char
    getCharacter(int offset)
    Retrieve a character again.
    void
    getCharacters(char[] array, int offset, int start, int end)
    Retrieve characters again.
    void
    getCharacters(StringBuffer buffer, int offset, int length)
    Append characters already read into a StringBuffer.
    Get the encoding being used to convert characters.
    Get the input stream being used.
    getString(int offset, int length)
    Retrieve a string.
    void
    mark(int readAheadLimit)
    Mark the present position in the source.
    boolean
    Tell whether this source supports the mark() operation.
    int
    Get the position (in characters).
    int
    Read a single character.
    int
    read(char[] cbuf)
    Read characters into an array.
    int
    read(char[] cbuf, int off, int len)
    Read characters into a portion of an array.
    boolean
    Tell whether this source is ready to be read.
    void
    Reset the source.
    void
    setEncoding(String character_set)
    Begins reading from the source with the given character set.
    long
    skip(long n)
    Skip characters.
    void
    Undo the read of a single character.

    Methods inherited from class java.io.Reader

    nullReader, read, transferTo

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • BUFFER_SIZE

      public static int BUFFER_SIZE
      An initial buffer size. Has a default value of {16384}.
    • mStream

      protected transient InputStream mStream
      The stream of bytes. Set to null when the source is closed.
    • mEncoding

      protected String mEncoding
      The character set in use.
    • mReader

      protected transient InputStreamReader mReader
      The converter from bytes to characters.
    • mBuffer

      protected char[] mBuffer
      The characters read so far.
    • mLevel

      protected int mLevel
      The number of valid bytes in the buffer.
    • mOffset

      protected int mOffset
      The offset of the next byte returned by read().
    • mMark

      protected int mMark
      The bookmark.
  • Constructor Details

  • Method Details

    • getStream

      public InputStream getStream()
      Get the input stream being used.
      Returns:
      The current input stream.
    • getEncoding

      public String getEncoding()
      Get the encoding being used to convert characters.
      Specified by:
      getEncoding in class Source
      Returns:
      The current encoding.
    • setEncoding

      public void setEncoding(String character_set) throws ParserException
      Begins reading from the source with the given character set. If the current encoding is the same as the requested encoding, this method is a no-op. Otherwise any subsequent characters read from this page will have been decoded using the given character set.

      Some magic happens here to obtain this result if characters have already been consumed from this source. Since a Reader cannot be dynamically altered to use a different character set, the underlying stream is reset, a new Source is constructed and a comparison made of the characters read so far with the newly read characters up to the current position. If a difference is encountered, or some other problem occurs, an exception is thrown.

      Specified by:
      setEncoding in class Source
      Parameters:
      character_set - The character set to use to convert bytes into characters.
      Throws:
      ParserException - If a character mismatch occurs between characters already provided and those that would have been returned had the new character set been in effect from the beginning. An exception is also thrown if the underlying stream won't put up with these shenanigans.
    • fill

      protected void fill(int min) throws IOException
      Fetch more characters from the underlying reader. Has no effect if the underlying reader has been drained.
      Parameters:
      min - The minimum to read.
      Throws:
      IOException - If the underlying reader read() throws one.
    • close

      public void close() throws IOException
      Does nothing. It's supposed to close the source, but use destroy() instead.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in class Source
      Throws:
      IOException - not used
      See Also:
    • read

      public int read() throws IOException
      Read a single character. This method will block until a character is available, an I/O error occurs, or the end of the stream is reached.
      Specified by:
      read in class Source
      Returns:
      The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or EOF if the end of the stream has been reached
      Throws:
      IOException - If an I/O error occurs.
    • read

      public int read(char[] cbuf, int off, int len) throws IOException
      Read characters into a portion of an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
      Specified by:
      read in class Source
      Parameters:
      cbuf - Destination buffer
      off - Offset at which to start storing characters
      len - Maximum number of characters to read
      Returns:
      The number of characters read, or EOF if the end of the stream has been reached
      Throws:
      IOException - If an I/O error occurs.
    • read

      public int read(char[] cbuf) throws IOException
      Read characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
      Specified by:
      read in class Source
      Parameters:
      cbuf - Destination buffer.
      Returns:
      The number of characters read, or EOF if the end of the stream has been reached.
      Throws:
      IOException - If an I/O error occurs.
    • reset

      public void reset() throws IllegalStateException
      Reset the source. Repositions the read point to begin at zero.
      Specified by:
      reset in class Source
      Throws:
      IllegalStateException - If the source has been closed.
    • markSupported

      public boolean markSupported()
      Tell whether this source supports the mark() operation.
      Specified by:
      markSupported in class Source
      Returns:
      true.
    • mark

      public void mark(int readAheadLimit) throws IOException
      Mark the present position in the source. Subsequent calls to reset() will attempt to reposition the source to this point.
      Specified by:
      mark in class Source
      Parameters:
      readAheadLimit - Not used.
      Throws:
      IOException - If the source is closed.
    • ready

      public boolean ready() throws IOException
      Tell whether this source is ready to be read.
      Specified by:
      ready in class Source
      Returns:
      true if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.
      Throws:
      IOException - If the source is closed.
    • skip

      public long skip(long n) throws IOException, IllegalArgumentException
      Skip characters. This method will block until some characters are available, an I/O error occurs, or the end of the stream is reached. Note: n is treated as an int
      Specified by:
      skip in class Source
      Parameters:
      n - The number of characters to skip.
      Returns:
      The number of characters actually skipped
      Throws:
      IllegalArgumentException - If n is negative.
      IOException - If an I/O error occurs.
    • unread

      public void unread() throws IOException
      Undo the read of a single character.
      Specified by:
      unread in class Source
      Throws:
      IOException - If the source is closed or no characters have been read.
    • getCharacter

      public char getCharacter(int offset) throws IOException
      Retrieve a character again.
      Specified by:
      getCharacter in class Source
      Parameters:
      offset - The offset of the character.
      Returns:
      The character at offset.
      Throws:
      IOException - If the offset is beyond offset() or the source is closed.
    • getCharacters

      public void getCharacters(char[] array, int offset, int start, int end) throws IOException
      Retrieve characters again.
      Specified by:
      getCharacters in class Source
      Parameters:
      array - The array of characters.
      offset - The starting position in the array where characters are to be placed.
      start - The starting position, zero based.
      end - The ending position (exclusive, i.e. the character at the ending position is not included), zero based.
      Throws:
      IOException - If the start or end is beyond offset() or the source is closed.
    • getString

      public String getString(int offset, int length) throws IOException
      Retrieve a string.
      Specified by:
      getString in class Source
      Parameters:
      offset - The offset of the first character.
      length - The number of characters to retrieve.
      Returns:
      A string containing the length characters at offset.
      Throws:
      IOException - If the offset or (offset + length) is beyond offset() or the source is closed.
    • getCharacters

      public void getCharacters(StringBuffer buffer, int offset, int length) throws IOException
      Append characters already read into a StringBuffer.
      Specified by:
      getCharacters in class Source
      Parameters:
      buffer - The buffer to append to.
      offset - The offset of the first character.
      length - The number of characters to retrieve.
      Throws:
      IOException - If the offset or (offset + length) is beyond offset() or the source is closed.
    • destroy

      public void destroy() throws IOException
      Close the source. Once a source has been closed, further read, ready, mark, reset, skip, unread, getCharacter or getString invocations will throw an IOException. Closing a previously-closed source, however, has no effect.
      Specified by:
      destroy in class Source
      Throws:
      IOException - If an I/O error occurs
    • offset

      public int offset()
      Get the position (in characters).
      Specified by:
      offset in class Source
      Returns:
      The number of characters that have already been read, or EOF if the source is closed.
    • available

      public int available()
      Get the number of available characters.
      Specified by:
      available in class Source
      Returns:
      The number of characters that can be read without blocking or zero if the source is closed.