XSLT, удаляющий произвольные повторяющиеся элементы-братья

Ответ here делает именно то, что я хочу, за исключением того, что я не хочу просто удалять повторяющихся братьев и сестер определенного элемента, я хочу удалить повторяющихся братьев и сестер всех элементов.XSLT, удаляющий произвольные повторяющиеся элементы-братья

Кроме того, для моих целей элемент «дубликат» будет иметь те же атрибуты, элементы-потомки и текст, что и его брат.

Как этот ответ может быть изменен для достижения моей цели?

Вот мой текущий лист стиль:

XSL:

<!-- 
    When a file is transformed using this stylesheet the output will be 
    formatted as follows: 

    1.) Elements named "info" will be removed 
    2.) Duplicate sibling elements will be removed 
    3.) Attributes named "file_line_nr" or "file_name" will be removed 
    4.) Comments will be removed 
    5.) Processing instructions will be removed 
    6.) XML declaration will be removed 
    7.) Extra whitespace will be removed 
    8.) Empty attributes will be removed 
    9.) Elements which have no attributes, child elements, or text will be removed 
    10.) All elements will be sorted by name recursively 
    11.) All attributes will be sorted by name 
--> 
<xsl:stylesheet 
    version="1.0" 
    xmlns:xalan="http://xml.apache.org/xalan" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

    <xsl:output indent="yes" method="xml" omit-xml-declaration="yes"/> 
    <xsl:strip-space elements="*"/> 

    <!-- 
     Elements/attributes to remove. Note that comments are not elements or 
     attributes. Since there is no template to match comments they are 
     automatically ignored. 
    --> 
    <xsl:template match="@*[normalize-space()='']|info|@file_line_nr|@file_name"/> 

    <!-- Match any attribute --> 
    <xsl:template match="@*"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*"/> 
     </xsl:copy> 
    </xsl:template> 

    <!-- Match any element --> 
    <xsl:template match="*"> 
     <xsl:variable name="elementFragment"> 
      <xsl:copy> 
       <xsl:apply-templates select="@*"> 
        <xsl:sort select="name()"/> 
       </xsl:apply-templates> 
       <xsl:apply-templates> 
        <xsl:sort select="name()"/> 
       </xsl:apply-templates> 
      </xsl:copy> 
     </xsl:variable> 
     <xsl:variable name="element" select="xalan:nodeset($elementFragment)/*"/> 
     <xsl:if test="$element/@* or $element/* or normalize-space($element)"> 
      <xsl:copy-of select="$element"/> 
     </xsl:if> 
    </xsl:template> 

</xsl:stylesheet>

Входной XML:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?><!-- XML declaration should be removed --> 
<z b="b" a="a" c="c"> 
    <?some-app inst="some instruction"?><!-- Processing instructions should be removed --> 
    <a><!-- Keep elements like this because it has child elements --> 
     <x c="c" b="b" a="a"/><!-- Keep elements like this because it has attributes --> 
     <c>some text</c><!-- Keep elements like this because it has text --> 
     <info a="a"/><!-- Elements named "info" are to be removed --> 
     <w file_line_nr="42" file_name="somefile.txt"/><!-- Attributes named "file_line_nr" and "file_name" are to be removed which will leave this element empty, so it should be removed too --> 
     <d/><!-- Remove elements like this because it has not attributes, no children, and no text --> 

     <v a="a"><!-- Keep this element because it and it sibling "v" element are unique.. It does not have the same exact descendants as its sibling "v" element --> 
      some text 
      <i a="a">some text</i> 
      <q a="a">some text</q> 
     </v> 
     <v a="a"> 
      some text 
      <i a="a">some different text</i><!-- text is different --> 
      <q a="a">some text</q> 
     </v> 

     <e a="a"><!-- Keep this element because it and it sibling "e" element are unique.. It does not have the same exact descendants as its sibling "e" element --> 
      some text 
      <j a="a"> 
       <p>some text</p> 
      </j> 
     </e> 
     <e a="a"> 
      some text 
      <j a="a"> 
       <p>some different text</p><!-- text is different --> 
      </j> 
     </e> 

     <u a="a"><!-- Keep this element because it and it sibling "e" element are unique.. It does not have the same exact descendants as its sibling "e" element --> 
      some text 
      <k a="a">some text</k> 
      <n a="a">some text</n> 
     </u> 
     <u a="a"> 
      some text 
      <k b="b">some text</k><!-- attribute is different --> 
      <n a="a">some text</n> 
     </u> 

     <f a="a"><!-- Keep this element because it and it sibling "f" element are unique.. It does not have the same exact attributes as its sibling "f" element --> 
      some text 
      <l a="a">some text</l> 
      <m a="a">some text</m> 
     </f> 
     <f b="b"><!-- attribute is different --> 
      some text 
      <l a="a">some text</l> 
      <m a="a">some text</m> 
     </f> 

     <t a="a"><!-- Keep this element because it and it sibling "t" element are unique. It does not have the same exact text as its sibling "t" element --> 
      some text 
      <az a="a">some text</az> 
      <aa a="a">some text</aa> 
     </t> 
     <t a="a"> 
      some different text<!-- text is different --> 
      <az a="a">some text</az> 
      <aa a="a">some text</aa> 
     </t> 

     <g a="a"><!-- Remove this element because it is NOT unique. Its attributes, descendants, and text are exactly the same as its sibling "g" element --> 
      some text 
      <ay a="a">some text</ay> 
      <ab a="a">some text</ab> 
     </g> 
     <g a="a"> 
      some text 
      <ay a="a">some text</ay> 
      <ab a="a">some text</ab> 
     </g> 

     <s a="a"/> 
    </a> 
    <y a="a"/> 
    <b> 
     <h a="a" /> 
     <r a="a"/> 
    </b> 
</z>

Желаемая Вывод XML: (элементы и атрибуты отсортированный. Комментарии и отступы/пробелы также будут быть удалены, но я добавил их обратно сюда для удобства чтения)

<z a="a" b="b" c="c"> 
    <a> 
     <c>some text</c> 
     <e a="a"> 
      some text 
      <j a="a"> 
       <p>some text</p> 
      </j> 
     </e> 
     <e a="a"> 
      some text 
      <j a="a"> 
       <p>some different text</p> 
      </j> 
     </e> 
     <f a="a"> 
      some text 
      <l a="a">some text</l> 
      <m a="a">some text</m> 
     </f> 
     <f b="b"> 
      some text 
      <l a="a">some text</l> 
      <m a="a">some text</m> 
     </f> 
     <g a="a"><!-- The sibling "g" element of this element was removed because it was an exact duplicate --> 
      some text 
      <ab a="a">some text</ab> 
      <ay a="a">some text</ay> 
     </g> 
     <s a="a"/> 
     <t a="a"> 
      some text 
      <aa a="a">some text</aa> 
      <az a="a">some text</az> 
     </t> 
     <t a="a"> 
      some different text 
      <aa a="a">some text</aa> 
      <az a="a">some text</az> 
     </t> 
     <u a="a"> 
      some text 
      <k a="a">some text</k> 
      <n a="a">some text</n> 
     </u> 
     <u a="a"> 
      some text 
      <k b="b">some text</k> 
      <n a="a">some text</n> 
     </u> 
     <v a="a"> 
      some text 
      <i a="a">some text</i> 
      <q a="a">some text</q> 
     </v> 
     <v a="a"> 
      some text 
      <i a="a">some different text</i> 
      <q a="a">some text</q> 
     </v> 
     <x a="a" b="b" c="c"/> 
    </a> 
    <b> 
     <h a="a"/> 
     <r a="a"/> 
    </b> 
    <y a="a"/> 
</z>

источник

2013-09-12 ubiquibacon

В качестве примера вашей входной XML будет приятный для тестирования. –

@BenL Обновленный вопрос с примером ввода XML и желаемого выходного XML. – ubiquibacon

Не можете ли вы использовать Saxon 9 и XSLT 2.0 с функцией 'deep-equal' вместо Xalan и XSLT 1.0? –

Вот мое предложение, чтобы показать, как deep-equal и XSLT 2.0 может помочь:.

<xsl:stylesheet 
    version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

    <xsl:output indent="yes" method="xml" omit-xml-declaration="yes"/> 
    <xsl:strip-space elements="*"/> 

    <!-- identity for most attributes --> 
    <xsl:template match="@*"> 
     <xsl:copy/> 
    </xsl:template> 

    <xsl:template match="*"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*"> 
      <xsl:sort select="local-name()"/> 
      </xsl:apply-templates> 
      <xsl:for-each-group select="node() except (processing-instruction(), comment())" group-adjacent="boolean(self::*)"> 
      <xsl:choose> 
       <xsl:when test="current-grouping-key()"> 
       <xsl:apply-templates select="current-group()"> 
        <xsl:sort select="local-name()"/> 
       </xsl:apply-templates> 
       </xsl:when> 
       <xsl:otherwise> 
       <xsl:apply-templates select="current-group()"/> 
       </xsl:otherwise> 
      </xsl:choose> 
      </xsl:for-each-group> 
     </xsl:copy> 
    </xsl:template> 

    <!-- 
     Elements/attributes to remove. 
    --> 
    <xsl:template match="@*[normalize-space()='']|info|@file_line_nr|@file_name 
         | *[not(@* | node())]"/> 


    <!-- remove (well, don't copy) element nodes which are deep-equal to 
     a preceding sibling element 
    --> 
    <xsl:template match="*[some $ps in preceding-sibling::* satisfies deep-equal(., $ps)]"/> 


</xsl:stylesheet>

источник

2013-09-13 09:18:00

Это очень близко к тому, что я хочу, но дубликат элемента мог быть ЛЮБЫМ предыдущим братом, а не только «предыдущим братом». Я попробовал просто использовать 'previous', но это не сработало. – ubiquibacon

Вы можете изменить '' to ''. –

Спасибо, что сделал! Благодарим вас за исправление моего предупреждения с помощью ''. Несколько других вещей, которые я заметил: *** 1.) *** Вы переместили шаблон «удалить» в нижнюю часть. Имеет ли порядок этих шаблонов? *** 2.) *** Вы изменили комментарий к тому же шаблону, на котором было установлено предупреждение. Комментарий теперь читает «identity для большинства элементов и атрибутов». Моя номенклатура XSL не хороша, почему ваш комментарий более точен? Я думал, что шаблон соответствует только атрибутам, это не так? – ubiquibacon

XSLT, удаляющий произвольные повторяющиеся элементы-братья

ответ

Смежные вопросы