Ruby/XML,XSLT和XPath

什麼是XML ?

可擴展標記語言（XML）是一種標記語言，就像HTML或SGML。由萬維網聯盟推薦使用，可作為一個開放的標準。

XML是一種便攜式，開源語言，它允許程序員開發應用程序，可以由其他應用程序讀取，而不管操作係統和/或語言發展。

XML是非常有用的跟蹤小到中量的數據，而不需要一個SQL為基礎框架。

XML解析器體係結構和API：

有兩種不同風格的XML解析器：

SAX-like (Stream interfaces) : 在這裡，有用的事件注冊回調讓解析器通過文件進行。當文件比較大或者有內存限製，這是非常有用的，它解析的文件，它從磁盤讀取，整個文件是永遠不會存儲在內存中。
DOM-like (Object tree interfaces) : 這是萬維網協會建議，其中將整個文件讀入內存，並存儲在一個層次（樹）的形式來表示XML文檔的所有功能。

在處理大型文件時SAX顯然冇有 DOM 那樣快，另一方麵，使用DOM獨占太多資源，特彆是用了很多小文件。

SAX是隻讀的，而DOM允許更改的XML文件。由於這兩個不同的API從字麵上互相補充冇有任何理由不能使用這兩個在大型項目中。

使用Ruby解析和創建XML：

最常見的用來操作XML方式是由肖恩·羅素REXML庫。 2002年以來，REXML一直是Ruby標準的一部分。

REXML是一個純Ruby的XML處理器符合XML 1.0標準。這是一個驗證處理器，通過所有的OASIS非驗證一致性測試。

REXML解析器超過其他可用的解析器具有以下優點：

100% 是Ruby語言編寫實現
它可以被用來SAX和DOM解析
它是輕量級的，一共不到2000行代碼
是很容易理解的方法和類
基於SAX2 API和完整的XPath支持
附帶Ruby安裝，不需要單獨安裝

對於我們所有的XML代碼的例子，讓我們用一個簡單的XML文件作為輸入：

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
   <movie title="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

DOM的解析：

讓我們先來解析XML數據樹的形式。首先，我們需要 rexml/document; 為方便起見,我們經常做導入到頂層的命名空間，包含REXML。

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# Now get the root element
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]

# This will output all the movie titles.
xmldoc.elements.each("collection/movie"){ 
   |e| puts "Movie Title : " + e.attributes["title"] 
}

# This will output all the movie types.
xmldoc.elements.each("collection/movie/type") {
   |e| puts "Movie Type : " + e.text 
}

# This will output all the movie description.
xmldoc.elements.each("collection/movie/description") {
   |e| puts "Movie Description : " + e.text 
}

這將產生以下結果：

Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War, Thriller
Movie Type : Anime, Science Fiction
Movie Type : Anime, Action
Movie Type : Comedy
Movie Description : Talk about a US-Japan war
Movie Description : A schientific fiction
Movie Description : Vash the Stampede!
Movie Description : Viewable boredom

類似SAX的解析：

要處理相同的數據：movies.xml，文件在麵向流的方式，我們將定義一個監聽器類，其方法是從解析器回調目標。

注意：不建議使用類似SAX解析一個小文件，這僅僅是一個演示的例子。

#!/usr/bin/ruby -w

require 'rexml/document'
require 'rexml/streamlistener'
include REXML


class MyListener
  include REXML::StreamListener
  def tag_start(*args)
    puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
  end

  def text(data)
    return if data =~ /^w*$/     # whitespace only
    abbrev = data[0..40] + (data.length > 40 ? "..." : "")
    puts "  text   :   #{abbrev.inspect}"
  end
end

list = MyListener.new
xmlfile = File.new("movies.xml")
Document.parse_stream(xmlfile, list)

這將產生以下結果：

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
  text   :   "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
  text   :   "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
  text   :   "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Viewable boredom"

XPath和Ruby：

查看XML的另一種方法是XPath。這是一種偽語言，介紹了如何找到特定的XML文檔中的元素和屬性，把該文件作為一個邏輯有序樹。

REXML XPath支持通過XPath類。正如我們已經看到的，它假定基於樹的解析（文檔對象模型）。

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# Info for the first movie found
movie = XPath.first(xmldoc, "//movie")
p movie

# Print out all the movie types
XPath.each(xmldoc, "//type") { |e| puts e.text }

# Get an array of all of the movie formats.
names = XPath.match(xmldoc, "//format").map {|x| x.text }
p names

這將產生以下結果：

<movie title='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT和Ruby：

有兩個XSLT解析器Ruby可以使用。每個在這裡給出一個簡短的描述：

Ruby-Sablotron:

這個解析器由 Masayoshi Takahashi 編寫和維護。這主要是寫Linux操作係統，需要以下庫：

Sablot
Iconv
Expat

可以找到這個模塊 Ruby-Sablotron.

XSLT4R:

XSLT4R由邁克爾·諾伊曼（Michael Neumann）開發維護，可以發現了RAA的庫部分在XML。 XSLT4R使用一個簡單的命令行界麵，雖然它也可以被用來在第三方應用程序來轉換XML文檔。

XSLT4R需要操作XMLScan的，它是包含內XSLT4R歸檔文件，這也是一個100％的Ruby模塊。這些模塊可以安裝使用標準的Ruby安裝方法（即Ruby的install.rb）。

XSLT4R的語法如下：

ruby xslt.rb stylesheet.xsl document.xml [arguments]

如果想使用XSLT4R從應用程序中可以包括XSLT和輸入所需的參數。下麵的例子：

require "xslt"

stylesheet = File.readlines("stylesheet.xsl").to_s
xml_doc = File.readlines("document.xml").to_s
arguments = { 'image_dir' => '/....' }

sheet = XSLT::Stylesheet.new( stylesheet, arguments )

# output to StdOut
sheet.apply( xml_doc )

# output to 'str'
str = ""
sheet.output = [ str ]
sheet.apply( xml_doc )

Ruby/XML,XSLT和XPath

什麼是XML ?

XML解析器體係結構和API：

使用Ruby解析和創建XML：

DOM的解析：

類似SAX的解析：

XPath和Ruby：

XSLT和Ruby：

Ruby-Sablotron:

XSLT4R:

更多閱讀：