PHP
downloads | documentation | faq | getting help | mailing lists | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

utf8_decode> <wddx_serialize_vars
Last updated: Sat, 17 Jul 2004

view this page in

CXIX. XML 해석기 함수

소개

XML(eXtensible Markup Language)은 웹에서 구조화된 문서 교환을 위한 데이터 형식입니다. The World Wide Web consortium(W3C)에서 정의한 표준입니다. XML과 관련 기술에 대한 정보는 http://www.w3.org/XML/에서 볼 수 있습니다.

PHP 확장은 James Clark의 expat를 사용합니다. 이 툴킷은 XML 문서를 처리할 수 있게 하지만, 유효성을 검증하지는 않습니다. PHP에서도 지원하는 세가지 문자 인코딩을 지원합니다: US_ASCII, ISO-8859-1, UTF-8. UTF-16은 지원하지 않습니다.

이 확장은 XML 파서를 작성하고 여러가지 XML 이벤트에 대한 핸들러를 정의할 수 있게 합니다. 각각의 XML 파서는 조절할 수 있는 약간의 인자를 가집니다.

요구 사항

이 확장은 http://www.jclark.com/xml/expat.html에서 찾을 수 있는 expat를 사용합니다. expat에 들어있는 Makefile은 기본값으로 라이브러리를 생성하지 않기 때문에, 다음의 make 규칙을 사용할 수 있습니다:

libexpat.a: $(OBJS)
    ar -rc $@ $(OBJS)
    ranlib $@
expat의 소스 RPM 패키지는 http://sourceforge.net/projects/expat/에서 찾을 수 있습니다.

설치

이 함수들은 번들된 expat 라이브러리를 이용하여 기본값으로 활성화되어 있습니다. XML 지원을 비활성화 하려면 --disable-xml을 사용하십시오. PHP를 아파치 1.3.9 이상의 모듈로 컴파일한다면, PHP는 자동적으로 아파치에 번들된 expat를 사용합니다. 번들된 expat 라이브러리를 사용하지 않으려면, PHP 설정에 --with-expat-dir=DIR을 지정하십시오. DIR은 expat를 설치한 베이스 디렉토리를 지정해야 합니다.

PHP 윈도우 버전에서는 이 확장 모듈에 대한 지원이 포함되어 있습니다. 이 함수들을 이용하기 위해서 추가로 확장 모듈을 읽어들일 필요가 없습니다.

런타임 설정

이 확장 모듈은 php.ini 설정이 존재하지 않습니다.

리소스 종류

xml

xml_parser_create()xml_parser_create_ns()가 반환하는 xml 자원은 xml 파서 인스탠스를 참조하고, 이 확장이 제공하는 함수들이 사용합니다.

상수 정의

이 확장 모듈은 다음의 상수를 정의합니다. 이는 확장 모듈을 PHP에 내장했거나, 실행시에 동적으로 읽어들일 경우에만 사용할 수 있습니다.

XML_ERROR_NONE (integer)

XML_ERROR_NO_MEMORY (integer)

XML_ERROR_SYNTAX (integer)

XML_ERROR_NO_ELEMENTS (integer)

XML_ERROR_INVALID_TOKEN (integer)

XML_ERROR_UNCLOSED_TOKEN (integer)

XML_ERROR_PARTIAL_CHAR (integer)

XML_ERROR_TAG_MISMATCH (integer)

XML_ERROR_DUPLICATE_ATTRIBUTE (integer)

XML_ERROR_JUNK_AFTER_DOC_ELEMENT (integer)

XML_ERROR_PARAM_ENTITY_REF (integer)

XML_ERROR_UNDEFINED_ENTITY (integer)

XML_ERROR_RECURSIVE_ENTITY_REF (integer)

XML_ERROR_ASYNC_ENTITY (integer)

XML_ERROR_BAD_CHAR_REF (integer)

XML_ERROR_BINARY_ENTITY_REF (integer)

XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF (integer)

XML_ERROR_MISPLACED_XML_PI (integer)

XML_ERROR_UNKNOWN_ENCODING (integer)

XML_ERROR_INCORRECT_ENCODING (integer)

XML_ERROR_UNCLOSED_CDATA_SECTION (integer)

XML_ERROR_EXTERNAL_ENTITY_HANDLING (integer)

XML_OPTION_CASE_FOLDING (integer)

XML_OPTION_TARGET_ENCODING (integer)

XML_OPTION_SKIP_TAGSTART (integer)

XML_OPTION_SKIP_WHITE (integer)

이벤트 핸들러

정의된 XML 이벤트 핸들러는:

표 1. 지원하는 XML 핸들러

핸들러를 설정하는 PHP 함수이벤트 설명
xml_set_element_handler() 엘레멘트 이벤트는 XML 파서가 시작과 끝 태그에 도달했을 때 발생합니다. 시작 태그와 끝 태그에 별도의 핸들러가 존재합니다.
xml_set_character_data_handler() Character data is roughly all the non-markup contents of XML documents, including whitespace between tags. Note that the XML parser does not add or remove any whitespace, it is up to the application (you) to decide whether whitespace is significant.
xml_set_processing_instruction_handler() PHP 프로그래머는 이미 프로세싱 인스트럭션(PIs)에 익숙할 것입니다. <?php ?>는 프로세싱 인스트럭션이고, php은 "PI 타겟"이라 불립니다. 예약된 "XML"로 시작하는 PI 타겟들을 제외하면, 이들에 대한 핸들링은 어플리케이션 특화입니다.
xml_set_default_handler() What goes not to another handler goes to the default handler. You will get things like the XML and document type declarations in the default handler.
xml_set_unparsed_entity_decl_handler() 이 핸들러는 unparsed (NDATA) 엔트리의 정의를 호출합니다.
xml_set_notation_decl_handler() 이 핸들러는 notation의 정의를 호출합니다.
xml_set_external_entity_ref_handler() This handler is called when the XML parser finds a reference to an external parsed general entity. This can be a reference to a file or URL, for example. See the external entity example for a demonstration.

케이스 폴딩

The element handler functions may get their element names case-folded. Case-folding is defined by the XML standard as "a process applied to a sequence of characters, in which those identified as non-uppercase are replaced by their uppercase equivalents". In other words, when it comes to XML, case-folding simply means uppercasing.

By default, all the element names that are passed to the handler functions are case-folded. This behaviour can be queried and controlled per XML parser with the xml_parser_get_option() and xml_parser_set_option() functions, respectively.

오류 코드

다음 상수들이 XML 오류 코드로 정의되어 있습니다 (xml_parse()가 반환합니다):

XML_ERROR_NONE
XML_ERROR_NO_MEMORY
XML_ERROR_SYNTAX
XML_ERROR_NO_ELEMENTS
XML_ERROR_INVALID_TOKEN
XML_ERROR_UNCLOSED_TOKEN
XML_ERROR_PARTIAL_CHAR
XML_ERROR_TAG_MISMATCH
XML_ERROR_DUPLICATE_ATTRIBUTE
XML_ERROR_JUNK_AFTER_DOC_ELEMENT
XML_ERROR_PARAM_ENTITY_REF
XML_ERROR_UNDEFINED_ENTITY
XML_ERROR_RECURSIVE_ENTITY_REF
XML_ERROR_ASYNC_ENTITY
XML_ERROR_BAD_CHAR_REF
XML_ERROR_BINARY_ENTITY_REF
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
XML_ERROR_MISPLACED_XML_PI
XML_ERROR_UNKNOWN_ENCODING
XML_ERROR_INCORRECT_ENCODING
XML_ERROR_UNCLOSED_CDATA_SECTION
XML_ERROR_EXTERNAL_ENTITY_HANDLING

문자 인코딩

PHP의 XML 확장은 유니코드 문자셋을 통해 서로 다른 문자 인코딩을 지원합니다. 문자 인코딩에는 소스 인코딩타겟 인코딩의 두 종류가 존재합니다. PHP 내부에서 문서 표현은 항상 UTF-8로 인코드되어 있습니다.

소스 인코딩은 XML 문서가 parse되었을 때 이루어집니다. XML 파서를 작성할 때, 소스 인코딩을 지정할 수 있습니다. (이 인코딩은 XML 파서가 종료될때까지 변경할 수 없습니다) 지원하는 소스 인코딩은 ISO-8859-1, US-ASCII, UTF-8입니다. 앞쪽의 두개는 싱글-바이트 인코딩이기에, 각각의 문자는 하나의 바이트로 표현됩니다. UTF-8은 1에서 4바이트 사이에서 다양한 수의 비트(21까지)를 조합하여 인코드할 수 있습니다. PHP에서 사용하는 기본 소스 인코딩은 ISO-8859-1입니다.

타겟 인코딩은 PHP가 XML 핸들러 함수에 데이터를 넘길 때 이루어집니다. XML 파서를 작성하면, 타겟 인코딩을 소스 인코딩과 동일하게 설정하지만, 이는 언제라도 변경할 수 있습니다. 타켓 인코딩은 문자 데이터뿐만 아니라 태그 이름과 프로세싱 인스트럭션 타겟에도 영향을 미칩니다.

If the XML parser encounters characters outside the range that its source encoding is capable of representing, it will return an error.

If PHP encounters characters in the parsed XML document that can not be represented in the chosen target encoding, the problem characters will be "demoted". Currently, this means that such characters are replaced by a question mark.

예제

XML 문서를 파싱하는 몇몇 예제 PHP 스크립트입니다.

XML 엘레멘트 구조 예제

This first example displays the structure of the start elements in a document with indentation.

예 1. XML 엘레멘트 구조 보기

<?php
$file
= "data.xml";
$depth = array();

function
startElement($parser, $name, $attrs)
{
    global
$depth;
    for (
$i = 0; $i < $depth[$parser]; $i++) {
        echo
"  ";
    }
    echo
"$name\n";
   
$depth[$parser]++;
}

function
endElement($parser, $name)
{
    global
$depth;
   
$depth[$parser]--;
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!(
$fp = fopen($file, "r"))) {
    die(
"XML 입력을 열 수 없습니다.");
}

while (
$data = fread($fp, 4096)) {
    if (!
xml_parse($xml_parser, $data, feof($fp))) {
        die(
sprintf("XML 에러: %s at line %d",
                   
xml_error_string(xml_get_error_code($xml_parser)),
                   
xml_get_current_line_number($xml_parser)));
    }
}
xml_parser_free($xml_parser);
?>

XML 태그 매핑 예제

예 2. XML을 HTML로 맵

이 예제는 XML 문서의 태그를 직접 HTML 태그로 매핑합니다. "맵 배열"에 존재하지 않는 요소는 무시합니다. 물론, 이 예제는 특정한 XML 문서형에만 작동합니다.

<?php
$file
= "data.xml";
$map_array = array(
   
"BOLD"     => "B",
   
"EMPHASIS" => "I",
   
"LITERAL"  => "TT"
);

function
startElement($parser, $name, $attrs)
{
    global
$map_array;
    if (isset(
$map_array[$name])) {
        echo
"<$map_array[$name]>";
    }
}

function
endElement($parser, $name)
{
    global
$map_array;
    if (isset(
$map_array[$name])) {
        echo
"</$map_array[$name]>";
    }
}

function
characterData($parser, $data)
{
    echo
$data;
}

$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!(
$fp = fopen($file, "r"))) {
    die(
"XML 입력을 열 수 없습니다.");
}

while (
$data = fread($fp, 4096)) {
    if (!
xml_parse($xml_parser, $data, feof($fp))) {
        die(
sprintf("XML 에러: %s at line %d",
                   
xml_error_string(xml_get_error_code($xml_parser)),
                   
xml_get_current_line_number($xml_parser)));
    }
}
xml_parser_free($xml_parser);
?>

XML 외부 엔티티 예제

This example highlights XML code. It illustrates how to use an external entity reference handler to include and parse other documents, as well as how PIs can be processed, and a way of determining "trust" for PIs containing code.

XML documents that can be used for this example are found below the example (xmltest.xml and xmltest2.xml.)

예 3. External Entity Example

<?php
$file
= "xmltest.xml";

function
trustedFile($file)
{
   
// only trust local files owned by ourselves
   
if (!eregi("^([a-z]+)://", $file)
        &&
fileowner($file) == getmyuid()) {
            return
true;
    }
    return
false;
}

function
startElement($parser, $name, $attribs)
{
    echo
"&lt;<font color=\"#0000cc\">$name</font>";
    if (
sizeof($attribs)) {
        while (list(
$k, $v) = each($attribs)) {
            echo
" <font color=\"#009900\">$k</font>=\"<font
                   color=\"#990000\">$v</font>\""
;
        }
    }
    echo
"&gt;";
}

function
endElement($parser, $name)
{
    echo
"&lt;/<font color=\"#0000cc\">$name</font>&gt;";
}

function
characterData($parser, $data)
{
    echo
"<b>$data</b>";
}

function
PIHandler($parser, $target, $data)
{
    switch (
strtolower($target)) {
        case
"php":
            global
$parser_file;
           
// If the parsed document is "trusted", we say it is safe
            // to execute PHP code inside it.  If not, display the code
            // instead.
           
if (trustedFile($parser_file[$parser])) {
                eval(
$data);
            } else {
               
printf("Untrusted PHP code: <i>%s</i>",
                       
htmlspecialchars($data));
            }
            break;
    }
}

function
defaultHandler($parser, $data)
{
    if (
substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") {
       
printf('<font color="#aa00aa">%s</font>',
               
htmlspecialchars($data));
    } else {
       
printf('<font size="-1">%s</font>',
               
htmlspecialchars($data));
    }
}

function
externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
                                 
$publicId) {
    if (
$systemId) {
        if (!list(
$parser, $fp) = new_xml_parser($systemId)) {
           
printf("Could not open entity %s at %s\n", $openEntityNames,
                  
$systemId);
            return
false;
        }
        while (
$data = fread($fp, 4096)) {
            if (!
xml_parse($parser, $data, feof($fp))) {
               
printf("XML error: %s at line %d while parsing entity %s\n",
                      
xml_error_string(xml_get_error_code($parser)),
                      
xml_get_current_line_number($parser), $openEntityNames);
               
xml_parser_free($parser);
                return
false;
            }
        }
       
xml_parser_free($parser);
        return
true;
    }
    return
false;
}

function
new_xml_parser($file)
{
    global
$parser_file;

   
$xml_parser = xml_parser_create();
   
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
   
xml_set_element_handler($xml_parser, "startElement", "endElement");
   
xml_set_character_data_handler($xml_parser, "characterData");
   
xml_set_processing_instruction_handler($xml_parser, "PIHandler");
   
xml_set_default_handler($xml_parser, "defaultHandler");
   
xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
   
    if (!(
$fp = @fopen($file, "r"))) {
        return
false;
    }
    if (!
is_array($parser_file)) {
       
settype($parser_file, "array");
    }
   
$parser_file[$xml_parser] = $file;
    return array(
$xml_parser, $fp);
}

if (!(list(
$xml_parser, $fp) = new_xml_parser($file))) {
    die(
"could not open XML input");
}

echo
"<pre>";
while (
$data = fread($fp, 4096)) {
    if (!
xml_parse($xml_parser, $data, feof($fp))) {
        die(
sprintf("XML error: %s at line %d\n",
                   
xml_error_string(xml_get_error_code($xml_parser)),
                   
xml_get_current_line_number($xml_parser)));
    }
}
echo
"</pre>";
echo
"parse complete\n";
xml_parser_free($xml_parser);

?>

예 4. xmltest.xml

<?xml version='1.0'?>
<!DOCTYPE chapter SYSTEM "/just/a/test.dtd" [
<!ENTITY plainEntity "FOO entity">
<!ENTITY systemEntity SYSTEM "xmltest2.xml">
]>
<chapter>
 <TITLE>Title &plainEntity;</TITLE>
 <para>
  <informaltable>
   <tgroup cols="3">
    <tbody>
     <row><entry>a1</entry><entry morerows="1">b1</entry><entry>c1</entry></row>
     <row><entry>a2</entry><entry>c2</entry></row>
     <row><entry>a3</entry><entry>b3</entry><entry>c3</entry></row>
    </tbody>
   </tgroup>
  </informaltable>
 </para>
 &systemEntity;
 <section id="about">
  <title>About this Document</title>
  <para>
   <!-- this is a comment -->
   <?php echo 'Hi!  This is PHP version '.phpversion(); ?>
  </para>
 </section>
</chapter>

이 파일은 xmltest.xml에서 포함합니다:

예 5. xmltest2.xml

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY testEnt "test entity">
]>
<foo>
   <element attrib="value"/>
   &testEnt;
   <?php echo "This is some more PHP code being executed."; ?>
</foo>

차례
utf8_decode --  Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1.
utf8_encode -- Encodes an ISO-8859-1 string to UTF-8
xml_error_string -- Get XML parser error string
xml_get_current_byte_index -- Get current byte index for an XML parser
xml_get_current_column_number --  Get current column number for an XML parser
xml_get_current_line_number -- Get current line number for an XML parser
xml_get_error_code -- Get XML parser error code
xml_parse_into_struct -- Parse XML data into an array structure
xml_parse -- Start parsing an XML document
xml_parser_create_ns --  Create an XML parser with namespace support
xml_parser_create -- Create an XML parser
xml_parser_free -- Free an XML parser
xml_parser_get_option -- Get options from an XML parser
xml_parser_set_option -- Set options in an XML parser
xml_set_character_data_handler -- Set up character data handler
xml_set_default_handler -- Set up default handler
xml_set_element_handler -- Set up start and end element handlers
xml_set_end_namespace_decl_handler --  Set up end namespace declaration handler
xml_set_external_entity_ref_handler -- Set up external entity reference handler
xml_set_notation_decl_handler -- Set up notation declaration handler
xml_set_object -- Use XML Parser within an object
xml_set_processing_instruction_handler --  Set up processing instruction (PI) handler
xml_set_start_namespace_decl_handler --  Set up start namespace declaration handler
xml_set_unparsed_entity_decl_handler --  Set up unparsed entity declaration handler


utf8_decode> <wddx_serialize_vars
Last updated: Sat, 17 Jul 2004
 
add a note add a note User Contributed Notes
XML 해석기 함수
v9 at fakehalo dot us
13-Jul-2007 08:04
I needed this for work/personal use.  Sometimes you'll have a XML string generated as one long string and no line breaks...nusoap in the case of today/work, but there are any other number of possible things that will generate these.  Anyways, this simply takes a long XML string and returns an indented/line-breaked version of the string for display/readability.

<?
function xmlIndent($str){
    $ret = "";
    $indent = 0;
    $indentInc = 3;
    $noIndent = false;
    while(($l = strpos($str,"<",$i))!==false){
        if($l!=$r && $indent>0){ $ret .= "\n" . str_repeat(" ",$indent) . substr($str,$r,($l-$r)); }
        $i = $l+1;
        $r = strpos($str,">",$i)+1;
        $t = substr($str,$l,($r-$l));
        if(strpos($t,"/")==1){
            $indent -= $indentInc;
            $noIndent = true;
        }
        else if(($r-$l-strpos($t,"/"))==2 || substr($t,0,2)=="<?"){ $noIndent = true; }
        if($indent<0){ $indent = 0; }
        if($ret){ $ret .= "\n"; }
        $ret .= str_repeat(" ",$indent);
        $ret .= $t;
        if(!$noIndent){ $indent += $indentInc; }
        $noIndent = false;
    }
    $ret .= "\n";
    return($ret);
}
?>

(...this was only tested for what i needed at work, could POSSIBLY need additions)
ricardo at sismeiro dot com
08-Jun-2007 04:29
<?php

/**
 * correction of the previous code
 */

/**
 * Converts XML into Array
 *
 * @param array $result
 * @param object  $root
 * @param string $rootname
 */
function convert_xml2array(&$result,$root,$rootname='root'){
   
   
$n=count($root->children());

    if (
$n>0){

       
/**
         * start of the correction
         */
       
if (!isset($result[$rootname]['@attributes'])){
           
$result[$rootname]['@attributes']=array();
            foreach (
$root->attributes() as $atr=>$value){
               
$result[$rootname]['@attributes'][$atr]=(string)$value;
            }           
        }
       
/**
         *  end of the correction
         */
       
        
foreach ($root->children() as $child){
            
$name=$child->getName();    
            
convert_xml2array($result[$rootname][],$child,$name);                         
         }
    } else {       
       
$result[$rootname]= (array) $root;
        if (!isset(
$result[$rootname]['@attributes'])){
           
$result[$rootname]['@attributes']=array();
        }
    }
}

/**
 * Example how to use the function convert_xml2array
 */

/**
 * Return  Array from a xml string
 *
 * @param string $xml
 * @return array
 */
function get_array_fromXML($xml){       
   
   
$result=array();   
   
   
$doc=simplexml_load_string($xml);    
   
   
convert_xml2array($result,$doc);    
   
    return
$result['root'];   
}

?>
adamaflynn at criticaldevelopment dot net
14-Apr-2007 11:50
Here is an example of another XML parsing script that parses the document into an array/object structure instead of relying on startElement, endElement, etc handlers.

You can find the documentation at:
http://www.criticaldevelopment.net/xml/doc.php

And the code (both PHP4 and PHP5 versions):
http://www.criticaldevelopment.net/xml/parser_php4.phps
http://www.criticaldevelopment.net/xml/parser_php5.phps

If you have any questions about it, just drop me an e-mail.
phpzmurf[at]yahoo.com
12-Apr-2007 04:19
/*
 * Parse rss news, quotes etc.
 *
 * author : phpZmurf <phpzmurf[at]yahoo.com>
 * created: 12.04.2007
 * ver    : 1.0
 *
*/

$data = implode("", file("http://feeds.feedburner.com/quotationspage/qotd/"));
$parser = xml_parser_create();
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parse_into_struct($parser, $data, $values, $tags);
xml_parser_free($parser);

# data saved here
$arrQuotes = array();
# at the beginig - the tag is set colsed
$tagOpen = false;

foreach($values as $key => $item) {
    if(!$tagOpen and $item['tag'] == 'item' and $item['type'] == 'open') {
        # item tag opens
        $tagOpen = true;
        # empty temporary variables
        $temp_title = '';
        $temp_description = '';
        $temp_guid = '';
        $temp_link = '';
    } elseif($item['tag'] == 'item' and $item['type'] == 'close') {
        # item tag ends
        $tagOpen = false;
        # if all 4 tags contain data... add them to output array
        if($temp_title != '' and $temp_description != '' and $temp_guid != '' and $temp_link != '') {
            $arrQuotes[] = array(
                'title' => $temp_title,
                'description' => $temp_description,
                'guid' => $temp_guid,
                'link' => $temp_link
            );
        }
    } else {
        # save data into temporary variables
        switch($item['tag']) {
            case 'title':
                $temp_title = $item['value'];
            break;
            case 'description':
                # this here quz there was a fuggin <p> at the end of the desription
                #$temp_description = $item['value'];
                $temp_description = substr($item['value'], 0, strpos($item['value'], '<'));
            break;
            case 'guid':
                $temp_guid = $item['value'];
            break;
            case 'link':
                $temp_link = $item['value'];
            break;
            default: break;
        }
    }
}

foreach($arrQuotes as $key => $item) {
    print_r($item);
}
Sheer Pullen
14-Mar-2007 07:27
I took the code posted by forqoun and modified it to be somewhat more readable (by me), somewhat more friendly to the idea of parsing multiple files with the same object, and to be compatable with a HTTP POST of XML data. Anyone who's interested in my version of associated array output can check it out at http://www.sheer.us/code/php/xml-parse-to-associative-array.phpsrc

Be nice to me, this is my first published php code
geoffers [at] gmail [dot] com
30-Dec-2006 03:27
Time to add my attempt at a very simple script that parses XML into a structure:

<?php

class Simple_Parser
{
    var
$parser;
    var
$error_code;
    var
$error_string;
    var
$current_line;
    var
$current_column;
    var
$data = array();
    var
$datas = array();
   
    function
parse($data)
    {
       
$this->parser = xml_parser_create('UTF-8');
       
xml_set_object($this->parser, $this);
       
xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
       
xml_set_element_handler($this->parser, 'tag_open', 'tag_close');
       
xml_set_character_data_handler($this->parser, 'cdata');
        if (!
xml_parse($this->parser, $data))
        {
           
$this->data = array();
           
$this->error_code = xml_get_error_code($this->parser);
           
$this->error_string = xml_error_string($this->error_code);
           
$this->current_line = xml_get_current_line_number($this->parser);
           
$this->current_column = xml_get_current_column_number($this->parser);
        }
        else
        {
           
$this->data = $this->data['child'];
        }
       
xml_parser_free($this->parser);
    }

    function
tag_open($parser, $tag, $attribs)
    {
       
$this->data['child'][$tag][] = array('data' => '', 'attribs' => $attribs, 'child' => array());
       
$this->datas[] =& $this->data;
       
$this->data =& $this->data['child'][$tag][count($this->data['child'][$tag])-1];
    }

    function
cdata($parser, $cdata)
    {
       
$this->data['data'] .= $cdata;
    }

    function
tag_close($parser, $tag)
    {
       
$this->data =& $this->datas[count($this->datas)-1];
       
array_pop($this->datas);
    }
}

$xml_parser = new Simple_Parser;
$xml_parser->parse('<foo><bar>test</bar></foo>');

?>
Didier: dlvb ** free * fr
24-Dec-2006 09:53
Hi !

After parsing the XML and modifying it, I just add a method to rebuild the XML form the internal structure (xmlp->document).
The method xmlp->toXML writes into xmlp->XML attributes. Then, you just have to output it.
I hope it helps.

class XMLParser {

var $parser;
var $filePath;
var $document;
var $currTag;
var $tagStack;
var $XML;
var $_tag_to_close = false;
var $TAG_ATTRIBUT = 'attr';
var $TAG_DATA = 'data';

function XMLParser($path) {
    $this->parser = xml_parser_create();
    $this->filePath = $path;
    $this->document = array();
    $this->currTag =& $this->document;
    $this->tagStack = array();
    $this->XML = "";
}

function parse() {
    xml_set_object($this->parser, $this);
    xml_set_character_data_handler($this->parser, 'dataHandler');
    xml_set_element_handler($this->parser, 'startHandler', 'endHandler');

   if(!($fp = fopen($this->filePath, "r"))) {
       die("Cannot open XML data file: $this->filePath");
       return false;
     }

    while($data = fread($fp, 4096)) {
        if(!xml_parse($this->parser, $data, feof($fp))) {
            die(sprintf("XML error: %s at line %d",
                xml_error_string(xml_get_error_code($this->parser)),
             xml_get_current_line_number($this->parser)));
      }
    }

    fclose($fp);
    xml_parser_free($this->parser);

    return true;
}

function startHandler($parser, $name, $attribs) {
     if(!isset($this->currTag[$name]))
          $this->currTag[$name] = array();

     $newTag = array();
     if(!empty($attribs))
          $newTag[$this->TAG_ATTRIBUT] = $attribs;
     array_push($this->currTag[$name], $newTag);

     $t =& $this->currTag[$name];
     $this->currTag =& $t[count($t)-1];
     array_push($this->tagStack, $name);
}

function dataHandler($parser, $data) {
    $data = trim($data);

    if(!empty($data)) {
      if(isset($this->currTag[$this->TAG_DATA]))
            $this->currTag[$this->TAG_DATA] .= $data;
      else
            $this->currTag[$this->TAG_DATA] = $data;
    }
}

function endHandler($parser, $name) {
     $this->currTag =& $this->document;
     array_pop($this->tagStack);

     for($i = 0; $i < count($this->tagStack); $i++) {
          $t =& $this->currTag[$this->tagStack[$i]];
          $this->currTag =& $t[count($t)-1];
     }
}

function clearOutput () {
    $this->XML = "";
}

function openTag ($tag) {
    $this->XML.="<".strtolower ($tag);
    $this->_tag_to_close = true;
}

function closeTag () {
    if ($this->_tag_to_close) {
        $this->XML.=">";
        $this->_tag_to_close = false;
    }
}

function closingTag ($tag) {
    $this->XML.="</".strtolower ($tag).">";
}

function output_attributes ($contenu_fils) {
    foreach ($contenu_fils[$this->TAG_ATTRIBUT] as $nomAttribut => $valeur) {
        $this->XML.= " ".strtolower($nomAttribut)."=\"".$valeur."\"";
    }
}

function addData ($texte) {
// to be completed
    $ca  = array ("é", "è", "ê", "à");
    $par = array ("&eacute;", "&egrave;", "&ecirc;", "agrave;");
    return htmlspecialchars(str_replace ($ca, $par, $texte), ENT_NOQUOTES);
}

function toXML ($tags="") {
    if ($tags=="") {
        $tags = $this->document;
      $this->clearOutput ();
    }

    foreach ($tags as $tag => $contenu) {
        $this->process ($tag, $contenu);
    }
}

function process ($tag, $contenu) {
     // Pour tous les TAGs
    foreach ($contenu as $indice => $contenu_fils) {
         $this->openTag ($tag);

         // Pour tous les fils (non attribut et non data)
         foreach ($contenu_fils as $tagFils => $fils) {
             switch ($tagFils) {
                 case $this->TAG_ATTRIBUT:
                         $this->output_attributes ($contenu_fils);
                        $this->closeTag ();
                         break;
                 case $this->TAG_DATA:
                        $this->closeTag ();
                         $this->XML.= $this->addData ($contenu_fils [$this->TAG_DATA]);
                         break;
                 default:
                         $this->closeTag ();
                         $this->process ($tagFils, $fils);
                         break;
             }
         }

        $this->closingTag ($tag);
    }
}

}
dmeekins att gmail doot com
20-Dec-2006 07:02
I reworked some of the code I found posted previously here, mainly so I could access the structure of the parsed xml file by the tags' names. So if I was parsing html that's also valid xml, I could access the page title by $xmlp->document['HTML'][0]['HEAD'][0]['TITLE'][0]['data']. The index after the tag name corresponds to the occurrence of that tag. If there were two <head></head> in the same depth, then the second one could get accessed by ['HEAD'][1].

<?php
class XMLParser
{
        var
$parser;
    var
$filePath;
    var
$document;
    var
$currTag;
    var
$tagStack;
   
    function
XMLParser($path)
    {
       
$this->parser = xml_parser_create();
   
$this->filePath = $path;
   
$this->document = array();
   
$this->currTag =& $this->document;
   
$this->tagStack = array();
    }
   
    function
parse()
    {
       
xml_set_object($this->parser, $this);
       
xml_set_character_data_handler($this->parser, 'dataHandler');
       
xml_set_element_handler($this->parser, 'startHandler', 'endHandler');
       
    if(!(
$fp = fopen($this->filePath, "r")))
        {
            die(
"Cannot open XML data file: $this->filePath");
            return
false;
        }
   
        while(
$data = fread($fp, 4096))
        {
            if(!
xml_parse($this->parser, $data, feof($fp)))
            {
                die(
sprintf("XML error: %s at line %d",
                           
xml_error_string(xml_get_error_code($this->parser)),
                           
xml_get_current_line_number($this->parser)));
            }
        }
   
       
fclose($fp);
   
xml_parser_free($this->parser);
   
        return
true;
    }
   
    function
startHandler($parser, $name, $attribs)
    {
        if(!isset(
$this->currTag[$name]))
           
$this->currTag[$name] = array();
       
       
$newTag = array();
        if(!empty(
$attribs))
           
$newTag['attr'] = $attribs;
       
array_push($this->currTag[$name], $newTag);
       
       
$t =& $this->currTag[$name];
       
$this->currTag =& $t[count($t)-1];
       
array_push($this->tagStack, $name);
    }
   
    function
dataHandler($parser, $data)
    {
       
$data = trim($data);
       
        if(!empty(
$data))
        {
            if(isset(
$this->currTag['data']))
               
$this->currTag['data'] .= $data;
            else
               
$this->currTag['data'] = $data;
        }
    }
   
    function
endHandler($parser, $name)
    {
       
$this->currTag =& $this->document;
       
array_pop($this->tagStack);
       
        for(
$i = 0; $i < count($this->tagStack); $i++)
        {
           
$t =& $this->currTag[$this->tagStack[$i]];
           
$this->currTag =& $t[count($t)-1];
        }
    }
}
?>
vavricek at volny dot cz
18-Dec-2006 11:53
RE: forquan (29-Jan-2006 12:45)

Thanks, for your code (it was what I need), but ... it didn't works with my XML file. I think that you tested it on simple XML. Never mind.
I change few lines (problem was in endHandler function), and now it WORKS :-)

<?php
 $p
=& new xmlParser();
 
$p->parse("/* XML file*/");
 echo
"<pre>";
 
print_r($p->output);
 echo
"</pre>";

class
xmlParser{
   var
$xml_obj = null;
   var
$output = array();
   var
$attrs;

   function
xmlParser(){
      
$this->xml_obj = xml_parser_create();
      
xml_set_object($this->xml_obj,$this);
      
xml_set_character_data_handler($this->xml_obj, 'dataHandler');
      
xml_set_element_handler($this->xml_obj, "startHandler", "endHandler");
   }

   function
parse($path){
       if (!(
$fp = fopen($path, "r"))) {
           die(
"Cannot open XML data file: $path");
           return
false;
       }

       while (
$data = fread($fp, 4096)) {
           if (!
xml_parse($this->xml_obj, $data, feof($fp))) {
               die(
sprintf("XML error: %s at line %d",
              
xml_error_string(xml_get_error_code($this->xml_obj)),
              
xml_get_current_line_number($this->xml_obj)));
              
xml_parser_free($this->xml_obj);
           }
       }

       return
true;
   }

   function
startHandler($parser, $name, $attribs){
       
$_content = array();
       
$_content['name'] = $name;
        if(!empty(
$attribs))
           
$_content['attrs'] = $attribs;
       
array_push($this->output, $_content);
}

   function
dataHandler($parser, $data){
        if(!empty(
$data) && $data!="\n") {
           
$_output_idx = count($this->output) - 1;
           
$this->output[$_output_idx]['content'] .= $data;
        }
   }

   function
endHandler($parser, $name){
        if(
count($this->output) > 1) {
           
$_data = array_pop($this->output);
           
$_output_idx = count($this->output) - 1;
           
$add = array();
            if(!
$this->output[$_output_idx]['child'])
               
$this->output[$_output_idx]['child'] = array();
           
array_push($this->output[$_output_idx]['child'], $_data);
        }  
   }
}
?>
sasha at goldnet dot ca
15-Dec-2006 03:55
Re: hutch at midwales dot com

That function looks like major overkill.

To remove all white space between tags you could simply do:
preg_replace (">/\s+</" , "><" , $string);
hutch at midwales dot com
01-Oct-2006 09:26
First off, I'd like thank all and sundry for providing this excellent resource, it has been very helpful in getting my head around xml parsing.

I was recently handed the task of collecting a variety of xml streams, from many different sources and of widely varying quality.

If have found that the following function helped parsing the input by cleaning it up. It removes all leading and trailing whitespace and removes carriage returns and linefeeds.

Using this function before using xml_parser_create() has helped reduce a number of otherwise unexplainable anomalies, such as arbitrary cutoff of data or the data being divided into two, requiring concatenation. Data longer than 1024 characters still has to be concatenated, but I can live with that.

<?php
// remove whitespace and linefeeds and returns the name of a temporary file
// takes the name of an existing file as a parameter
function cleanxmlfile($file, $tmpdir="/tmp", $prefix="xxx_") {
   
$tmp = file_get_contents ($file);
   
$tmp = preg_replace("/^\s+/m","",$tmp);
   
$tmp = preg_replace("/\s+$/m","",$tmp);
   
$tmp = preg_replace("/\r/","",$tmp);
   
$tmp = preg_replace("/\n/","",$tmp);
   
$tmpfname = tempnam($tmpdir, $prefix);
   
$handle = fopen($tmpfname, "w");
   
fwrite($handle, "$tmp");
   
fclose($handle);
    return(
$tmpfname);
}
?>

HTH
forquan
28-Jan-2006 03:45
Here's code that will create an associative array from an xml file.  Keys are the tag data and subarrays are formed from attributes and child tags

<?php
$p
=& new xmlParser();
$p->parse('/*xml file*/');
print_r($p->output);
?>

<?php
class xmlParser{
   var
$xml_obj = null;
   var
$output = array();
   var
$attrs;

   function
xmlParser(){
      
$this->xml_obj = xml_parser_create();
      
xml_set_object($this->xml_obj,$this);
      
xml_set_character_data_handler($this->xml_obj, 'dataHandler');
      
xml_set_element_handler($this->xml_obj, "startHandler", "endHandler");
   }

   function
parse($path){
       if (!(
$fp = fopen($path, "r"))) {
           die(
"Cannot open XML data file: $path");
           return
false;
       }

       while (
$data = fread($fp, 4096)) {
           if (!
xml_parse($this->xml_obj, $data, feof($fp))) {
               die(
sprintf("XML error: %s at line %d",
              
xml_error_string(xml_get_error_code($this->xml_obj)),
              
xml_get_current_line_number($this->xml_obj)));
              
xml_parser_free($this->xml_obj);
           }
       }

       return
true;
   }

   function
startHandler($parser, $name, $attribs){
      
$_content = array();
       if(!empty(
$attribs))
        
$_content['attrs'] = $attribs;
      
array_push($this->output, $_content);
   }

   function
dataHandler($parser, $data){
       if(!empty(
$data) && $data!="\n") {
          
$_output_idx = count($this->output) - 1;
          
$this->output[$_output_idx]['content'] .= $data;
       }
   }

   function
endHandler($parser, $name){
       if(
count($this->output) > 1) {
          
$_data = array_pop($this->output);
          
$_output_idx = count($this->output) - 1;
          
$add = array();
           if (
$_data['attrs'])
               
$add['attrs'] = $_data['attrs'];
           if (
$_data['child'])
               
$add['child'] = $_data['child'];
          
$this->output[$_output_idx]['child'][$_data['content']] = $add;
       }    
   }
}
?>
Greg S
17-Nov-2005 08:56
If you need utf8_encode support and configure PHP with --disable-all you will have some trouble. Unfortunately the configure options aren't completely documented. If you need utf8 functions and have everything disabled just recompile PHP with --enable-xml and you should be good to go.
simonguada at yahoo dot fr
06-Apr-2005 02:31
to import xml into mysql

$file = "article_2_3032005467.xml";
$feed = array();
$key = "";
$info = "";

function startElement($xml_parser,  $attrs ) {
  global $feed;
   }

function endElement($xml_parser, $name) {
  global $feed,  $info;
   $key = $name;
  $feed[$key] = $info;
  $info = ""; }

function charData($xml_parser, $data ) {
  global $info;
  $info .= $data; }

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "charData" );
$fp = fopen($file, "r");
while ($data = fread($fp, 8192))
!xml_parse($xml_parser, $data, feof($fp));
xml_parser_free($xml_parser);

$sql= "INSERT INTO `article` ( `";
$j=0;
$i=count($feed);
foreach( $feed as $assoc_index => $value )
  {
  $j++;
  $sql.= strtolower($assoc_index);
  if($i>$j) $sql.= "` , `";
  if($i<=$j) {$sql.= "` ) VALUES ('";}
  }
 $h=0;
foreach( $feed as $assoc_index => $value )
  {
  $h++;
  $sql.= utf8_decode(trim(addslashes($value)));
  if($i-1>$h) $sql.= "', '";
  if($i<=$h) $sql.= "','')";
  }
  $sql=trim($sql);
  echo $sql;
compu_global_hyper_mega_net_2 at yahoo dot com
19-Sep-2004 01:35
The documentation regarding white space was never complete I think.

The XML_OPTION_SKIP_WHITE doesn't appear to do anything.  I want to preserve the newlines in a cdata section.  Setting XML_OPTION_SKIP_WHITE to 0 or false doesn't appear to help.  My character_data_handler is getting called once for each line.  This obviously should be reflected in the documentation as well.  When/how often does the handler get called exactly?  Having to build separate test cases is very time consuming.

Inserting newlines myself in my cdata handler is no good either.  For non actual CDATA sections that cause my handler to get called, long lines are split up in multiple calls.  My handler would not be able to tell the difference whether or not the subsequent calls would be due to the fact that the data is coming from the next line or the fact that some internal buffer is long enough for it to 'flush' out and call the handler.
This behaviour also needs to be properly documented.
odders
18-Mar-2004 10:36
I wrote a simple xml parser mainly to deal with rss version 2. I found lots of examples on the net, but they were all masive and bloated and hard to manipulate.

Output is sent to an array, which holds arrays containg data for each item.

Obviously, you will have to make modifications to the code to suit your needs, but there isnt a lot of code there, so that shouldnt be a problem.

<?php

   $currentElements
= array();
  
$newsArray = array();

  
readXml("./news.xml");

   echo(
"<pre>");
  
print_r($newsArray);
   echo(
"</pre>");

  
// Reads XML file into formatted html
  
function readXML($xmlFile)
   {

     
$xmlParser = xml_parser_create();

     
xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, false);
     
xml_set_element_handler($xmlParser, startElement, endElement);
     
xml_set_character_data_handler($xmlParser, characterData);

     
$fp = fopen($xmlFile, "r");

      while(
$data = fread($fp, filesize($xmlFile))){
        
xml_parse($xmlParser, $data, feof($fp));}

     
xml_parser_free($xmlParser);

   }

  
// Sets the current XML element, and pushes itself onto the element hierarchy
  
function startElement($parser, $name, $attrs)
   {

      global
$currentElements, $itemCount;

     
array_push($currentElements, $name);

      if(
$name == "item"){$itemCount += 1;}

   }

  
// Prints XML data; finds highlights and links
  
function characterData($parser, $data)
   {

      global
$currentElements, $newsArray, $itemCount;

     
$currentCount = count($currentElements);
     
$parentElement = $currentElements[$currentCount-2];
     
$thisElement = $currentElements[$currentCount-1];

      if(
$parentElement == "item"){
        
$newsArray[$itemCount-1][$thisElement] = $data;}
      else{
         switch(
$name){
            case
"title":
               break;
            case
"link":
               break;
            case
"description":
               break;
            case
"language":
               break;
            case
"item":
               break;}}

   }

  
// If the XML element has ended, it is poped off the hierarchy
  
function endElement($parser, $name)
   {

      global
$currentElements;

     
$currentCount = count($currentElements);
      if(
$currentElements[$currentCount-1] == $name){
        
array_pop($currentElements);}

   }

?>
talraith at withouthonor dot com
03-Feb-2004 02:27
I have created a class set that both parses XML into an object structure and from that structure creates XML code.  It is mostly finished but I thought I would post here as it may help someone out or if someone wants to use it as a base for their own parser.  The method for creating the object is original compared to the posts before this one.

The object tree is created by created seperate tag objects for each tag inside the main document object and associating them together by way of object references.  An index table is created so that each tag is assigned an ID number (in numerical order from 0) and can be accessed directly using that ID number.  Each tag has object references to its children.  There are no uses of eval() in this code.

The code is too long to post here, so I have made a HTML page that has it:  http://www.withouthonor.com/obj_xml.html

Sample code would look something like this:

<?

$xml = new xml_doc($my_xml_code);
$xml->parse();

$root_tag =& $xml->xml_index[0];
$children =& $root_tag->children;

// and so forth

// To create XML code using the object, would be similar to this:

$my_xml = new xml_doc();

$root_tag = $my_xml->CreateTag('ROOTTAG');
$my_xml->CreateTag('CHILDTAG',array(),'',$root_tag);

// The following is used for the CreateTag() method
// string Name (The name of the child tag)
// array Attributes (associative array of attributes for tag)
// string Content (textual data for the child tag)
// int ParentID (Index number for parent tag)

// To generate the XML, use the following method

$out_xml = $my_xml->generate();

?>
bradparks at bradparks dot com
17-Dec-2003 02:38
Hey;

If you need to parse XML on an older version of PHP (e.g. 4.0) or if you can't get the expat extension enabled on your server, you might want to check out the Saxy and DOMIT! xml parsers from Engage Interactive. They're opensource and pure php, so no extensions or changes to your server are required. I've been using them for over a month on some projects with no problems whatsoever!

Check em out at:

DOMIT!, a DOM based xml parser, uses Saxy (included)
http://www.engageinteractive.com/redir.php?resource=1&target=domit

or

Saxy, a sax based xml parser
http://www.engageinteractive.com/redir.php?resource=2&target=saxy

Brad
chris at hitcatcher dot com
07-Nov-2003 02:48
In regards to jon at gettys dot org's XML object, The data should be TRIM()ed to remove any whitespace that could appear in CDATA entered as :

<xml_tag>
    cdata here. cdata here. cdata here. cdata here.
</xml_tag>

So, after applying fred at barron dot com's suggested change to the characterData function, the function should appear as:

function characterData($parser, $data)
{
    global $obj;
    $data = addslashes($data);
    eval($obj->tree."->data.='".trim($data)."';");
}

SIDE NOTE: I'm fairly new to XML so perhaps it is considered bad form to enter CDATA as I did in my example. Is this true or is the extra whitespace for the sake of readablity acceptable?
ml at csite dot com
02-Jul-2003 08:29
A fix for the fread breaking thing:

while ($data = fread($fp, 4096)) {

    $data = $cache . $data;

    if (!feof($fp)) {
        if (preg_match_all("(</?[a-z0-9A-Z]+>)", $data, $regs)) {
            $lastTagname = $regs[0][count($regs[0])-1];
            $split = false;
            for ($i=strlen($data)-strlen($lastTagname); $i>=strlen($lastTagname); $i--) {
                if ($lastTagname == substr($data, $i, strlen($lastTagname))) {
                    $cache = substr($data, $i, strlen($data));
                    $data = substr($data, 0, $i);
                    $split = true;
                    break;
                }
            }
        }
        if (!$split) {
            $cache = $data;
        }
    }

    if (!xml_parse($xml_parser, $data, feof($fp))) {
        die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
    }
}
panania at 3ringwebs dot com
20-May-2003 03:12
The above example doesn't work when you're parsing a string being returned from a curl operation (why I don't know!) I kept getting undefined offsets at the highest element number in both the start and end element functions. It wasn't the string itself I know, because I substringed it to death with the same results. But I fixed the problem by adding these lines of code...

function defaultHandler($parser, $name) {
    global $depth;
@    $depth[$parser]--;
}

xml_set_default_handler($xml_parser, "defaultHandler");

Hope this helps 8-}
fred at barron dot com
22-Apr-2003 05:28
regarding jon at gettys dot org's nice XML to Object code, I've made some useful changes (IMHO) to the characterData function... my minor modifications allow multiple lines of data and it escapes quotes so errors don't occur in the eval...

function characterData($parser, $data)
{
    global $obj;
    $data = addslashes($data);
    eval($obj->tree."->data.='".$data."';");
}
software at serv-a-com dot com
17-Feb-2003 09:10
2. Pre Parser Strings and New Line Delimited Data
One important thing to note at this point is that the xml_parse function requires a string variable. You can manipulate the content of any string variable easily as we all know.

A better approach to removing newlines than:
while ($data = fread($fp, 4096)) {
$data = preg_replace("/\n|\r/","",$data); //flarp
if (!xml_parse($xml_parser, $data, feof($fp))) {...

Above works across all 3 line-delimited text files  (\n, \r, \r\n). But this could potentially (or will most likely) damage or scramble data contained in for example CDATA areas. As far as I am concerned end of line characters should not be used _within_ XML tags. What seems to be the ultimate solution is to pre-parse the loaded data this would require checking the position within the XML document and adding or subtracting (using a in-between fread temporary variable) data based on conditions like: "Is within tag", "Is within CDATA" etc. before fedding it to the parser. This of course opens up a new can of worms (as in parse data for the parser...). (above procedure would take place between fread and xml_parser calls this method would be compatible with the general usage examples on top of the page)

3. The Answer to parsing arbitrary XML and Preprocessor Revisited
You can't just feed any XML document to the parser you constructed and assuming that it will work! You have to know what kind of methods for storing data are used, for example is there a end of line delimited data in the  file ?, Are there any carriage returns in the tags etc... XML files come formatted in different ways some are just a one long string of characters with out any end of line markers others have newlines, carriage returns or both (Microsloth Windows). May or may not contain space and other whitespace between tags. For this reason it is important to what I call Normalize the data before feeding it to the parser. You can perform this with regular expressions or plain old str_replace and concatenation. In many cases this can be done to the file it self sometimes to string data on the fly( as shown in the example above). But I feel it is important to normalize the data before even calling the function to call xml_parse. If you have the ability to access all data before that call you can convert it to what you fell the data should have been in the first place and omit many surprises and expensive regular expression substitution (in a tight spot) while fread'ing the data.
software at serv-a-com dot com
17-Feb-2003 09:09
My previous XML post (software at serv-a-com dot com/22-Jan-2003 03:08) resulted in some of the visitors e-mailg me on the carriage return stripping issue with questions. I'll try to make the following mumble as brief and easy to understand as possible.

1. Overview of the 4096 fragmentation issue
As you know the following freads the file 4096 bytes at a time (that is 4KB) this is perhaps ok for testing expat and figuring out how things work, but it it rather dangerous in the production environment. Data may not be fully understandable due to fread fragmentation and improperly formatted due to numerous sources(formats) of data contained within (i.e. end of line delimited CDATA).

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {

Sometimes to save time one may want to load it all up into a one big variable and leave all the worries to expat. I think anything under 500 KB is ok (as long as nobody knows about it). Some may argue that larger variables are acceptable or even necessary because of the magic that take place while parsing using xml_parse. Our XML parser(expat) works and can be successfully implemented only when we know what type of XML data we are dealing with, it's average size and structure of general layout and data contained within tags. For example if the tags are followed by a line delimiter like a new line we can read it with fgets in and with minimal effort make sure that no data will be sent to the function that does not end with a end tag. But this require a fair knowledge of the file's preference for storing XML data and tags (and a bit of code between reading data and xml_parse'ing it).
software at serv-a-com dot com
22-Jan-2003 02:08
use:
while ($data = str_replace("\n","",fread($fp, 4096))){

instead of:
while ($data = fread($fp, 4096)) {
It will save you a headache.

and in response to (simen at bleed dot no 11-Jan-2003 04:27) "If the 4096 byte buffer fills up..."
Please take better care of your data don't just shove it in to the xml_parse() check and make sure that the tags are not sliced the middle, use a temporary variable between fread and xml_parse.
simen at bleed dot no
11-Jan-2003 03:27
I was experiencing really wierd behaviour loading a large XML document (91k) since the buffer of 4096, when reading the file actually doesn't take into consideration the following:

<node>this is my value</node>

If the 4096 byte buffer fills up at "my", you will get a split string into your xml_set_character_data_handler().

The only solution I've found so far is to read the whole document into a variable and then parse.
sfaulkner at hoovers dot com
04-Nov-2002 12:29
Building on... This allows you to return the value of an element using an XPath reference.  This code would of course need error handling added :-)

 function GetElementByName ($xml, $start, $end) {
   $startpos = strpos($xml, $start);
   if ($startpos === false) {
     return false;
   }
   $endpos = strpos($xml, $end);
   $endpos = $endpos+strlen($end);   
   $endpos = $endpos-$startpos;
   $endpos = $endpos - strlen($end);
   $tag = substr ($xml, $startpos, $endpos);
   $tag = substr ($tag, strlen($start));
   return $tag;
 }
 
 function XPathValue($XPath,$XML) {
   $XPathArray = explode("/",$XPath);
   $node = $XML;
   while (list($key,$value) = each($XPathArray)) {
     $node = GetElementByName($node, "<$value>", "</$value>");
   }
  
   return $node;
 }
 
  print XPathValue("Response/Shipment/TotalCharges/Value",$xml);
guy at bhaktiandvedanta dot com
27-Sep-2002 12:01
For a simple XML parser you can use this function. It doesn't require any extensions to run.

<?
// Extracts content from XML tag

function GetElementByName ($xml, $start, $end) {

    global $pos;
    $startpos = strpos($xml, $start);
    if ($startpos === false) {
        return false;
    }
    $endpos = strpos($xml, $end);
    $endpos = $endpos+strlen($end);   
    $pos = $endpos;
    $endpos = $endpos-$startpos;
    $endpos = $endpos - strlen($end);
    $tag = substr ($xml, $startpos, $endpos);
    $tag = substr ($tag, strlen($start));

    return $tag;

}

// Open and read xml file. You can replace this with your xml data.

$file = "data.xml";
$pos = 0;
$Nodes = array();

if (!($fp = fopen($file, "r"))) {
    die("could not open XML input");
}
while ($getline = fread($fp, 4096)) {
    $data = $data . $getline;
}

$count = 0;
$pos = 0;

// Goes throw XML file and creates an array of all <XML_TAG> tags.
while ($node = GetElementByName($data, "<XML_TAG>", "</XML_TAG>")) {
    $Nodes[$count] = $node;
    $count++;
    $data = substr($data, $pos);
}

// Gets infomation from tag siblings.
for ($i=0; $i<$count; $i++) {
$code = GetElementByName($Nodes[$i], "<Code>", "</Code>");
$desc = GetElementByName($Nodes[$i], "<Description>", "</Description>");
$price = GetElementByName($Nodes[$i], "<BasePrice>", "</BasePrice>");
}
?>

Hope this helps! :)
Guy Laor
dmarsh dot NO dot SPAM dot PLEASE at spscc dot ctc dot edu
18-Sep-2002 12:27
Some reference code I am working on as "XML Library" of which I am folding it info an object. Notice the use of the DEFINE:

Mainly Example 1 and parts of 2 & 3 re-written as an object:
--- MyXMLWalk.lib.php ---
<?php

if (!defined("PHPXMLWalk")) {
define("PHPXMLWalk",TRUE);

class
XMLWalk {
 var
$p; //short for xml parser;
 
var $e; //short for element stack/array

 
function prl($x,$i=0) {
   
ob_start();
   
print_r($x);
   
$buf=ob_get_contents();
   
ob_end_clean();
    return
join("\n".str_repeat(" ",$i),split("\n",$buf));
  }

 function
XMLWalk() {
 
$this->p = xml_parser_create();
 
$this->e = array();
 
xml_parser_set_option($this->p, XML_OPTION_CASE_FOLDING, true);
 
xml_set_element_handler($this->p, array(&$this, "startElement"), array(&$this, "endElement"));
 
xml_set_character_data_handler($this->p, array(&$this, "dataElement"));
 
register_shutdown_function(array(&$this, "free")); // make a destructor
 
}

  function
startElement($parser, $name, $attrs) {
    if (
count($attrs)>=1) {
     
$x = $this->prl($attrs, $this->e[$parser]+6);
    } else {
     
$x = "";
    }

    print
str_repeat(" ",$this->e[$parser]+0). "$name $x\n";
   
$this->e[$parser]++;
   
$this->e[$parser]++;
  }

  function
dataElement($parser, $data) {
    print
str_repeat(" ",$this->e[$parser]+0). htmlspecialchars($data, ENT_QUOTES) ."\n";
  }

  function
endElement($parser, $name) {
   
$this->e[$parser]--;
   
$this->e[$parser]--;
  }
  function
parse($data, $fp) {
    if (!
xml_parse($this->p, $data, feof($fp))) {
        die(
sprintf("XML error: %s at line %d",
                   
xml_error_string(xml_get_error_code($this->p)),
                   
xml_get_current_line_number($this->p)));
    }
  }

  function
free() {
   
xml_parser_free($this->p);
  }

}
// end of class

} // end of define

?>

--- end of file ---

Calling code:
<?php

...

require(
"MyXMLWalk.lib.php");

$file = "x.xml";

$xme = new XMLWalk;

if (!(
$fp = fopen($file, "r"))) {
    die(
"could not open XML input");
}

while (
$data = fread($fp, 4096)) {
 
$xme->parse($data, $fp);
}

...
?>
jon at gettys dot org
14-Aug-2002 01:59
[Editor's note: see also xml_parse_into_struct().]

Very simple routine to convert an XML file into a PHP structure. $obj->xml contains the resulting PHP structure. I would be interested if someone could suggest a cleaner method than the evals I am using.

<?
$filename = 'sample.xml';
$obj->tree = '$obj->xml';
$obj->xml = '';

function startElement($parser, $name, $attrs) {
    global $obj;
   
    // If var already defined, make array
    eval('$test=isset('.$obj->tree.'->'.$name.');');
    if ($test) {
      eval('$tmp='.$obj->tree.'->'.$name.';');
      eval('$arr=is_array('.$obj->tree.'->'.$name.');');
      if (!$arr) {
        eval('unset('.$obj->tree.'->'.$name.');');
        eval($obj->tree.'->'.$name.'[0]=$tmp;');
        $cnt = 1;
      }
      else {
        eval('$cnt=count('.$obj->tree.'->'.$name.');');
      }
     
      $obj->tree .= '->'.$name."[$cnt]";
    }
    else {
      $obj->tree .= '->'.$name;
    }
    if (count($attrs)) {
        eval($obj->tree.'->attr=$attrs;');
    }
}

function endElement($parser, $name) {
    global $obj;
    // Strip off last ->
    for($a=strlen($obj->tree);$a>0;$a--) {
        if (substr($obj->tree, $a, 2) == '->') {
            $obj->tree = substr($obj->tree, 0, $a);
            break;
        }
    }
}

function characterData($parser, $data) {
    global $obj;

    eval($obj->tree.'->data=\''.$data.'\';');
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($filename, "r"))) {
    die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
    if (!xml_parse($xml_parser, $data, feof($fp))) {
        die(sprintf("XML error: %s at line %d",
                    xml_error_string(xml_get_error_code($xml_parser)),
                    xml_get_current_line_number($xml_parser)));
    }
}
xml_parser_free($xml_parser);
print_r($obj->xml);
return 0;

?>
danielc at analysisandsolutions dot com
15-Apr-2002 02:23
I put up a good, simple, real world example of how to parse XML documents. While the sample grabs stock quotes off of the web, you can tweak it to do whatever you need.

http://www.analysisandsolutions.com/code/phpxml.htm
jason at N0SPAM dot projectexpanse dot com
22-Mar-2002 01:16
In reference to the note made by sam@cwa.co.nz about parsing entities:

I could be wrong, but since it is possible to define your own entities within an XML DTD, the cdata handler function parses these individually to allow for your own implementation of those entities within your cdata handler.
jason at NOSPAM_projectexpanse_NOSPAM dot com
26-Feb-2002 04:11
For newbies wanting a good tutorial on how to actually get started and where to go from this listing of functions, then visit:
http://www.wirelessdevnet.com/channels/wap/features/xmlcast_php.html

It shows an excellent example of how to read the XML data into a class file so you can actually process it, not just display it all pretty-like, like many tutorials on PHP/XML seem to be doing.
hans dot schneider at bbdo-interone dot de
24-Jan-2002 08:43
I had to TRIM the data when I passed one large String containig a wellformed XML-File to xml_parse. The String was read by CURL, which aparently put a BLANK at the end of the String. This BLANK produced a "XML not wellformed"-Error in xml_parse!
sam at cwa dot co dot nz
28-Sep-2000 07:39
I've discovered some unusual behaviour in this API when ampersand entities are parsed in cdata; for some reason the parser breaks up the section around the entities, and calls the handler repeated times for each of the sections. If you don't allow for this oddity and you are trying to put the cdata into a variable, only the last part will be stored.

You can get around this with a line like:

$foo .= $cdata;

If the handler is called several times from the same tag, it will append them, rather than rewriting the variable each time. If the entire cdata section is returned, it doesn't matter.

May happen for other entities, but I haven't investigated.

Took me a while to figure out what was happening; hope this saves someone else the trouble.
Daniel dot Rendall at btinternet dot com
07-Jul-1999 10:21
When using the XML parser, make sure you're not using the magic quotes option (e.g. use set_magic_quotes_runtime(0) if it's not the compiled default), otherwise you'll get 'not well-formed' errors when dealing with tags with attributes set in them.

utf8_decode> <wddx_serialize_vars
Last updated: Sat, 17 Jul 2004
 
 
show source | credits | stats | sitemap | contact | advertising | mirror sites