Package org.htmlparser.lexer
Class InputStreamSource
java.lang.Object
java.io.Reader
org.htmlparser.lexer.Source
org.htmlparser.lexer.InputStreamSource
- All Implemented Interfaces:
Closeable
,Serializable
,AutoCloseable
,Readable
A source of characters based on an InputStream such as from a URLConnection.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic int
An initial buffer size.protected char[]
The characters read so far.protected String
The character set in use.protected int
The number of valid bytes in the buffer.protected int
The bookmark.protected int
The offset of the next byte returned by read().protected InputStreamReader
The converter from bytes to characters.protected InputStream
The stream of bytes. -
Constructor Summary
ConstructorsConstructorDescriptionInputStreamSource
(InputStream stream) Create a source of characters using the default character set.InputStreamSource
(InputStream stream, String charset) Create a source of characters.InputStreamSource
(InputStream stream, String charset, int size) Create a source of characters. -
Method Summary
Modifier and TypeMethodDescriptionint
Get the number of available characters.void
close()
Does nothing.void
destroy()
Close the source.protected void
fill
(int min) Fetch more characters from the underlying reader.char
getCharacter
(int offset) Retrieve a character again.void
getCharacters
(char[] array, int offset, int start, int end) Retrieve characters again.void
getCharacters
(StringBuffer buffer, int offset, int length) Append characters already read into aStringBuffer
.Get the encoding being used to convert characters.Get the input stream being used.getString
(int offset, int length) Retrieve a string.void
mark
(int readAheadLimit) Mark the present position in the source.boolean
Tell whether this source supports the mark() operation.int
offset()
Get the position (in characters).int
read()
Read a single character.int
read
(char[] cbuf) Read characters into an array.int
read
(char[] cbuf, int off, int len) Read characters into a portion of an array.boolean
ready()
Tell whether this source is ready to be read.void
reset()
Reset the source.void
setEncoding
(String character_set) Begins reading from the source with the given character set.long
skip
(long n) Skip characters.void
unread()
Undo the read of a single character.Methods inherited from class java.io.Reader
nullReader, read, transferTo
-
Field Details
-
BUFFER_SIZE
public static int BUFFER_SIZEAn initial buffer size. Has a default value of {16384}. -
mStream
The stream of bytes. Set tonull
when the source is closed. -
mEncoding
The character set in use. -
mReader
The converter from bytes to characters. -
mBuffer
protected char[] mBufferThe characters read so far. -
mLevel
protected int mLevelThe number of valid bytes in the buffer. -
mOffset
protected int mOffsetThe offset of the next byte returned by read(). -
mMark
protected int mMarkThe bookmark.
-
-
Constructor Details
-
InputStreamSource
Create a source of characters using the default character set.- Parameters:
stream
- The stream of bytes to use.- Throws:
UnsupportedEncodingException
- If the default character set is unsupported.
-
InputStreamSource
Create a source of characters.- Parameters:
stream
- The stream of bytes to use.charset
- The character set used in encoding the stream.- Throws:
UnsupportedEncodingException
- If the character set is unsupported.
-
InputStreamSource
public InputStreamSource(InputStream stream, String charset, int size) throws UnsupportedEncodingException Create a source of characters.- Parameters:
stream
- The stream of bytes to use.charset
- The character set used in encoding the stream.size
- The initial character buffer size.- Throws:
UnsupportedEncodingException
- If the character set is unsupported.
-
-
Method Details
-
getStream
Get the input stream being used.- Returns:
- The current input stream.
-
getEncoding
Get the encoding being used to convert characters.- Specified by:
getEncoding
in classSource
- Returns:
- The current encoding.
-
setEncoding
Begins reading from the source with the given character set. If the current encoding is the same as the requested encoding, this method is a no-op. Otherwise any subsequent characters read from this page will have been decoded using the given character set.Some magic happens here to obtain this result if characters have already been consumed from this source. Since a Reader cannot be dynamically altered to use a different character set, the underlying stream is reset, a new Source is constructed and a comparison made of the characters read so far with the newly read characters up to the current position. If a difference is encountered, or some other problem occurs, an exception is thrown.
- Specified by:
setEncoding
in classSource
- Parameters:
character_set
- The character set to use to convert bytes into characters.- Throws:
ParserException
- If a character mismatch occurs between characters already provided and those that would have been returned had the new character set been in effect from the beginning. An exception is also thrown if the underlying stream won't put up with these shenanigans.
-
fill
Fetch more characters from the underlying reader. Has no effect if the underlying reader has been drained.- Parameters:
min
- The minimum to read.- Throws:
IOException
- If the underlying reader read() throws one.
-
close
Does nothing. It's supposed to close the source, but use destroy() instead.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classSource
- Throws:
IOException
- not used- See Also:
-
read
Read a single character. This method will block until a character is available, an I/O error occurs, or the end of the stream is reached.- Specified by:
read
in classSource
- Returns:
- The character read, as an integer in the range 0 to 65535
(0x00-0xffff), or
EOF
if the end of the stream has been reached - Throws:
IOException
- If an I/O error occurs.
-
read
Read characters into a portion of an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.- Specified by:
read
in classSource
- Parameters:
cbuf
- Destination bufferoff
- Offset at which to start storing characterslen
- Maximum number of characters to read- Returns:
- The number of characters read, or
EOF
if the end of the stream has been reached - Throws:
IOException
- If an I/O error occurs.
-
read
Read characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.- Specified by:
read
in classSource
- Parameters:
cbuf
- Destination buffer.- Returns:
- The number of characters read, or
EOF
if the end of the stream has been reached. - Throws:
IOException
- If an I/O error occurs.
-
reset
Reset the source. Repositions the read point to begin at zero.- Specified by:
reset
in classSource
- Throws:
IllegalStateException
- If the source has been closed.
-
markSupported
public boolean markSupported()Tell whether this source supports the mark() operation.- Specified by:
markSupported
in classSource
- Returns:
true
.
-
mark
Mark the present position in the source. Subsequent calls toreset()
will attempt to reposition the source to this point.- Specified by:
mark
in classSource
- Parameters:
readAheadLimit
- Not used.- Throws:
IOException
- If the source is closed.
-
ready
Tell whether this source is ready to be read.- Specified by:
ready
in classSource
- Returns:
true
if the next read() is guaranteed not to block for input,false
otherwise. Note that returning false does not guarantee that the next read will block.- Throws:
IOException
- If the source is closed.
-
skip
Skip characters. This method will block until some characters are available, an I/O error occurs, or the end of the stream is reached. Note: n is treated as an int- Specified by:
skip
in classSource
- Parameters:
n
- The number of characters to skip.- Returns:
- The number of characters actually skipped
- Throws:
IllegalArgumentException
- Ifn
is negative.IOException
- If an I/O error occurs.
-
unread
Undo the read of a single character.- Specified by:
unread
in classSource
- Throws:
IOException
- If the source is closed or no characters have been read.
-
getCharacter
Retrieve a character again.- Specified by:
getCharacter
in classSource
- Parameters:
offset
- The offset of the character.- Returns:
- The character at
offset
. - Throws:
IOException
- If the offset is beyondoffset()
or the source is closed.
-
getCharacters
Retrieve characters again.- Specified by:
getCharacters
in classSource
- Parameters:
array
- The array of characters.offset
- The starting position in the array where characters are to be placed.start
- The starting position, zero based.end
- The ending position (exclusive, i.e. the character at the ending position is not included), zero based.- Throws:
IOException
- If the start or end is beyondoffset()
or the source is closed.
-
getString
Retrieve a string.- Specified by:
getString
in classSource
- Parameters:
offset
- The offset of the first character.length
- The number of characters to retrieve.- Returns:
- A string containing the
length
characters atoffset
. - Throws:
IOException
- If the offset or (offset + length) is beyondoffset()
or the source is closed.
-
getCharacters
Append characters already read into aStringBuffer
.- Specified by:
getCharacters
in classSource
- Parameters:
buffer
- The buffer to append to.offset
- The offset of the first character.length
- The number of characters to retrieve.- Throws:
IOException
- If the offset or (offset + length) is beyondoffset()
or the source is closed.
-
destroy
Close the source. Once a source has been closed, furtherread
,ready
,mark
,reset
,skip
,unread
,getCharacter
orgetString
invocations will throw an IOException. Closing a previously-closed source, however, has no effect.- Specified by:
destroy
in classSource
- Throws:
IOException
- If an I/O error occurs
-
offset
public int offset()Get the position (in characters). -
available
public int available()Get the number of available characters.
-