PHPXMLRPC 4.11.0

Charset
in package
uses DeprecationLogger

Tags
todo

implement an interface

Table of Contents

Properties

$charset_supersets  : mixed
$instance  : Charset
$xml_iso88591_Entities  : mixed

Methods

encodeEntities()  : string
Convert a string to the correct XML representation in a target charset.
getEntities()  : array<string|int, mixed>
Used only for backwards compatibility (the .inc shims).
instance()  : Charset
This class is singleton for performance reasons.
isValidCharset()  : bool
Checks if a given charset encoding is present in a list of encodings or if it is a valid subset of any encoding in the list.
knownCharsets()  : array<string|int, string>
__construct()  : mixed
Force usage as singleton.
buildConversionTable()  : void
logDeprecation()  : mixed
logDeprecationUnlessCalledBy()  : void

Properties

$charset_supersets

protected mixed $charset_supersets = array('US-ASCII' => array('ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4', 'ISO-8859-5', 'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9', 'ISO-8859-10', 'ISO-8859-11', 'ISO-8859-12', 'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'UTF-8', 'EUC-JP', 'EUC-', 'EUC-KR', 'EUC-CN'))

$xml_iso88591_Entities

protected mixed $xml_iso88591_Entities = array("in" => array(), "out" => array())

Methods

encodeEntities()

Convert a string to the correct XML representation in a target charset.

public encodeEntities(string $data[, string $srcEncoding = '' ][, string $destEncoding = '' ]) : string

This involves:

  • character transformation for all characters which have a different representation in source and dest charsets
  • using 'charset entity' representation for all characters which are outside the target charset

To help correct communication of non-ascii chars inside strings, regardless of the charset used when sending requests, parsing them, sending responses and parsing responses, an option is to convert all non-ascii chars present in the message into their equivalent 'charset entity'. Charset entities enumerated this way are independent of the charset encoding used to transmit them, and all XML parsers are bound to understand them.

Note that when not sending a charset encoding mime type along with http headers, we are bound by RFC 3023 to emit strict us-ascii for 'text/xml' payloads (but we should review RFC 7303, which seems to have changed the rules...)

Parameters
$data : string
$srcEncoding : string = ''
$destEncoding : string = ''
Tags
todo

do a bit of basic benchmarking: strtr vs. str_replace, str_replace vs htmlspecialchars, hand-coded conversion vs mbstring when that is enabled

todo

make use of iconv when it is available and mbstring is not

todo

support aliases for charset names, eg ASCII, LATIN1, ISO-88591 (see f.e. polyfill-iconv for a list), but then take those into account as well in other methods, ie. isValidCharset)

todo

when converting to ASCII, allow to choose whether to escape the range 0-31,127 (non-print chars) or not

todo

allow picking different strategies to deal w. invalid chars? eg. source in latin-1 and chars 128-159

todo

add support for escaping using CDATA sections? (add cdata start and end tokens, replace only ']]>' with ']]]]><![CDATA[>')

Return values
string

getEntities()

Used only for backwards compatibility (the .inc shims).

public getEntities(string $charset) : array<string|int, mixed>
Parameters
$charset : string
Tags
throws
ValueErrorException

for unknown/unsupported charsets

Return values
array<string|int, mixed>

instance()

This class is singleton for performance reasons.

public static instance() : Charset
Tags
todo

should we just make $xml_iso88591_Entities a static variable instead ?

Return values
Charset

isValidCharset()

Checks if a given charset encoding is present in a list of encodings or if it is a valid subset of any encoding in the list.

public isValidCharset(string $encoding, string|array<string|int, mixed> $validList) : bool

kept around for BC, as it is not in use by the lib

Parameters
$encoding : string

charset to be tested

$validList : string|array<string|int, mixed>

comma separated list of valid charsets (or array of charsets)

Return values
bool

knownCharsets()

public knownCharsets() : array<string|int, string>
Return values
array<string|int, string>

__construct()

Force usage as singleton.

protected __construct() : mixed

buildConversionTable()

protected buildConversionTable(string $tableName) : void
Parameters
$tableName : string
Tags
throws
ValueErrorException

for unsupported $tableName

todo

add support for cp1252 as well as latin-2 .. latin-10 Optimization creep: instead of building all those tables on load, keep them ready-made php files which are not even included until needed

todo

should we add to the latin-1 table the characters from cp_1252 range, i.e. 128 to 159 ? Those will NOT be present in true ISO-8859-1, but will save the unwary windows user from sending junk (though no luck when receiving them...) Note also that, apparently, while 'ISO/IEC 8859-1' has no characters defined for bytes 128 to 159, IANA ISO-8859-1 does have well-defined 'C1' control codes for those - wikipedia's page on latin-1 says: "ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429." Check what mbstring/iconv do by default with those?

logDeprecation()

protected logDeprecation(mixed $message) : mixed
Parameters
$message : mixed

logDeprecationUnlessCalledBy()

protected logDeprecationUnlessCalledBy(string $expectedCaller) : void
Parameters
$expectedCaller : string

atm only the method name is supported


        
On this page

Search results