Parsing big XML files efficiently

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I'm dealing with XML files that are 4GB+ in size and wondering how I can best parse them. Right now I run into memory issues and looking for a way to not load the whole file in memory and go through it in batches perhaps?

Current code is using lxml and iterating over the repeating elements. Namespaces are cleared up front:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'

if file.lower().endswith('.xml'):



    parser = etree.XMLParser(remove_blank_text=True)

    tree = etree.parse(mvnFile, parser)

    root = tree.getroot()



    ####

    for elem in root.getiterator():

        if not hasattr(elem.tag, 'find'): continue  # (1)

        i = elem.tag.find('}')

        if i >= 0:

            elem.tag = elem.tag[i + 1:]

    objectify.deannotate(root, cleanup_namespaces=True)

    ####

    data = [{

        'Element1': tp.findtext('element1'),

        'Element2': tp.findtext('element2'),

        'Element3': tp.findtext('element3'),

    }

        for tp in tree.xpath('//mainelement/subelement')]



     df = pd.DataFrame(data)

print(df)

Furthermore I need to do some splitting of the values of the elements as they are space-separated. However, I only need specific values so I'm wondering if I can do this somehow within the parsing instead of splitting the columns on space afterwards?

xml example:

<mainelement>

    <subelement tc="00:00:00:000" ms="0">

        <element1>0.861668 0.496888 0.000000 0.000000 0.867815</element1>

        <element2>0.043423 0.509801 -0.111990 -0.070212 0.126711</element2>

        <element3>-0.001501 0.008416 0.000098 0.005241 0.005301</element3>

    </subelement>

    <subelement tc="00:00:00:001" ms="1">

        <element1>-0.503814 0.005664 -0.070326 -0.860926 -0.503720</element1>

        <element2>-0.044658 0.046381 0.909291 -0.033390 0.049348</element2>

        <element3>-0.000000 -0.000000 -0.000000 -0.005217 0.007849</element3>

    </subelement>

    <subelement tc="00:00:00:002" ms="2">

        <element1> -0.861173 0.503578 -0.007163 0.056031 0.862101</element1>

        <element2>0.371398 1.325794 -0.030966 0.059466 1.388910</element2>

        <element3>-0.010139 0.001048 0.026847 -0.010139 0.001048</element3>

    </subelement>

    <subelement tc="00:00:00:003" ms="3">

        <element1>0.856813 0.494664 0.003921 0.023356 0.868762</element1>

        <element2>-0.030966 0.059466 1.388910 -0.152636 -0.008650</element2>

        <element3>0.001048 0.026847 -0.010139 0.001048 0.035846</element3>

    </subelement>

</mainelement>

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

I don't know about Python XML handling, but in the Java world there are two types: SAX and DOM. DOM means "Document Object Model" and it means that the whole XML file gets loaded into memory, so that queries can be done afterwards. Once loaded, it is very quick, but it might consume huge amounts of memory. SAX however runs over your XML, and when a certain tag, attribute, content is reached, an event might be launched. This might take quite a while but it almost takes no memory. So, when you say "efficient", do you mean speed or memory related efficiency?

– Dominique
Nov 22 '18 at 12:29

speed at this point is not important I'd say. I just need to get the data into a database for now, and if I try to read all elements and all values I need, python will just crash or error (and MacOS starts force quitting applications ;) ).

– Chrisvdberge
Nov 22 '18 at 12:37

1

Did you check this?

– Andersson
Nov 22 '18 at 12:41

1

Possible duplicate of xml parsing in python for big data

– stovfl
Nov 22 '18 at 13:31

those links were quite helpful. The possible duplicate pointed in the right direction but didn't provided a clear concrete answer, so I added the code I came up with as answer to this question for clarity

– Chrisvdberge
Nov 23 '18 at 12:05

add a comment |

Current code is using lxml and iterating over the repeating elements. Namespaces are cleared up front:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'

if file.lower().endswith('.xml'):



    parser = etree.XMLParser(remove_blank_text=True)

    tree = etree.parse(mvnFile, parser)

    root = tree.getroot()



    ####

    for elem in root.getiterator():

        if not hasattr(elem.tag, 'find'): continue  # (1)

        i = elem.tag.find('}')

        if i >= 0:

            elem.tag = elem.tag[i + 1:]

    objectify.deannotate(root, cleanup_namespaces=True)

    ####

    data = [{

        'Element1': tp.findtext('element1'),

        'Element2': tp.findtext('element2'),

        'Element3': tp.findtext('element3'),

    }

        for tp in tree.xpath('//mainelement/subelement')]



     df = pd.DataFrame(data)

print(df)

xml example:

<mainelement>

    <subelement tc="00:00:00:000" ms="0">

        <element1>0.861668 0.496888 0.000000 0.000000 0.867815</element1>

        <element2>0.043423 0.509801 -0.111990 -0.070212 0.126711</element2>

        <element3>-0.001501 0.008416 0.000098 0.005241 0.005301</element3>

    </subelement>

    <subelement tc="00:00:00:001" ms="1">

        <element1>-0.503814 0.005664 -0.070326 -0.860926 -0.503720</element1>

        <element2>-0.044658 0.046381 0.909291 -0.033390 0.049348</element2>

        <element3>-0.000000 -0.000000 -0.000000 -0.005217 0.007849</element3>

    </subelement>

    <subelement tc="00:00:00:002" ms="2">

        <element1> -0.861173 0.503578 -0.007163 0.056031 0.862101</element1>

        <element2>0.371398 1.325794 -0.030966 0.059466 1.388910</element2>

        <element3>-0.010139 0.001048 0.026847 -0.010139 0.001048</element3>

    </subelement>

    <subelement tc="00:00:00:003" ms="3">

        <element1>0.856813 0.494664 0.003921 0.023356 0.868762</element1>

        <element2>-0.030966 0.059466 1.388910 -0.152636 -0.008650</element2>

        <element3>0.001048 0.026847 -0.010139 0.001048 0.035846</element3>

    </subelement>

</mainelement>

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

I don't know about Python XML handling, but in the Java world there are two types: SAX and DOM. DOM means "Document Object Model" and it means that the whole XML file gets loaded into memory, so that queries can be done afterwards. Once loaded, it is very quick, but it might consume huge amounts of memory. SAX however runs over your XML, and when a certain tag, attribute, content is reached, an event might be launched. This might take quite a while but it almost takes no memory. So, when you say "efficient", do you mean speed or memory related efficiency?

– Dominique
Nov 22 '18 at 12:29

speed at this point is not important I'd say. I just need to get the data into a database for now, and if I try to read all elements and all values I need, python will just crash or error (and MacOS starts force quitting applications ;) ).

– Chrisvdberge
Nov 22 '18 at 12:37

1

Did you check this?

– Andersson
Nov 22 '18 at 12:41

1

Possible duplicate of xml parsing in python for big data

– stovfl
Nov 22 '18 at 13:31

those links were quite helpful. The possible duplicate pointed in the right direction but didn't provided a clear concrete answer, so I added the code I came up with as answer to this question for clarity

– Chrisvdberge
Nov 23 '18 at 12:05

add a comment |

Current code is using lxml and iterating over the repeating elements. Namespaces are cleared up front:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'

if file.lower().endswith('.xml'):



    parser = etree.XMLParser(remove_blank_text=True)

    tree = etree.parse(mvnFile, parser)

    root = tree.getroot()



    ####

    for elem in root.getiterator():

        if not hasattr(elem.tag, 'find'): continue  # (1)

        i = elem.tag.find('}')

        if i >= 0:

            elem.tag = elem.tag[i + 1:]

    objectify.deannotate(root, cleanup_namespaces=True)

    ####

    data = [{

        'Element1': tp.findtext('element1'),

        'Element2': tp.findtext('element2'),

        'Element3': tp.findtext('element3'),

    }

        for tp in tree.xpath('//mainelement/subelement')]



     df = pd.DataFrame(data)

print(df)

xml example:

<mainelement>

    <subelement tc="00:00:00:000" ms="0">

        <element1>0.861668 0.496888 0.000000 0.000000 0.867815</element1>

        <element2>0.043423 0.509801 -0.111990 -0.070212 0.126711</element2>

        <element3>-0.001501 0.008416 0.000098 0.005241 0.005301</element3>

    </subelement>

    <subelement tc="00:00:00:001" ms="1">

        <element1>-0.503814 0.005664 -0.070326 -0.860926 -0.503720</element1>

        <element2>-0.044658 0.046381 0.909291 -0.033390 0.049348</element2>

        <element3>-0.000000 -0.000000 -0.000000 -0.005217 0.007849</element3>

    </subelement>

    <subelement tc="00:00:00:002" ms="2">

        <element1> -0.861173 0.503578 -0.007163 0.056031 0.862101</element1>

        <element2>0.371398 1.325794 -0.030966 0.059466 1.388910</element2>

        <element3>-0.010139 0.001048 0.026847 -0.010139 0.001048</element3>

    </subelement>

    <subelement tc="00:00:00:003" ms="3">

        <element1>0.856813 0.494664 0.003921 0.023356 0.868762</element1>

        <element2>-0.030966 0.059466 1.388910 -0.152636 -0.008650</element2>

        <element3>0.001048 0.026847 -0.010139 0.001048 0.035846</element3>

    </subelement>

</mainelement>

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

Current code is using lxml and iterating over the repeating elements. Namespaces are cleared up front:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'

if file.lower().endswith('.xml'):



    parser = etree.XMLParser(remove_blank_text=True)

    tree = etree.parse(mvnFile, parser)

    root = tree.getroot()



    ####

    for elem in root.getiterator():

        if not hasattr(elem.tag, 'find'): continue  # (1)

        i = elem.tag.find('}')

        if i >= 0:

            elem.tag = elem.tag[i + 1:]

    objectify.deannotate(root, cleanup_namespaces=True)

    ####

    data = [{

        'Element1': tp.findtext('element1'),

        'Element2': tp.findtext('element2'),

        'Element3': tp.findtext('element3'),

    }

        for tp in tree.xpath('//mainelement/subelement')]



     df = pd.DataFrame(data)

print(df)

xml example:

<mainelement>

    <subelement tc="00:00:00:000" ms="0">

        <element1>0.861668 0.496888 0.000000 0.000000 0.867815</element1>

        <element2>0.043423 0.509801 -0.111990 -0.070212 0.126711</element2>

        <element3>-0.001501 0.008416 0.000098 0.005241 0.005301</element3>

    </subelement>

    <subelement tc="00:00:00:001" ms="1">

        <element1>-0.503814 0.005664 -0.070326 -0.860926 -0.503720</element1>

        <element2>-0.044658 0.046381 0.909291 -0.033390 0.049348</element2>

        <element3>-0.000000 -0.000000 -0.000000 -0.005217 0.007849</element3>

    </subelement>

    <subelement tc="00:00:00:002" ms="2">

        <element1> -0.861173 0.503578 -0.007163 0.056031 0.862101</element1>

        <element2>0.371398 1.325794 -0.030966 0.059466 1.388910</element2>

        <element3>-0.010139 0.001048 0.026847 -0.010139 0.001048</element3>

    </subelement>

    <subelement tc="00:00:00:003" ms="3">

        <element1>0.856813 0.494664 0.003921 0.023356 0.868762</element1>

        <element2>-0.030966 0.059466 1.388910 -0.152636 -0.008650</element2>

        <element3>0.001048 0.026847 -0.010139 0.001048 0.035846</element3>

    </subelement>

</mainelement>

python lxml

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

asked Nov 22 '18 at 12:13

Chrisvdberge

5752822

I don't know about Python XML handling, but in the Java world there are two types: SAX and DOM. DOM means "Document Object Model" and it means that the whole XML file gets loaded into memory, so that queries can be done afterwards. Once loaded, it is very quick, but it might consume huge amounts of memory. SAX however runs over your XML, and when a certain tag, attribute, content is reached, an event might be launched. This might take quite a while but it almost takes no memory. So, when you say "efficient", do you mean speed or memory related efficiency?

– Dominique
Nov 22 '18 at 12:29

speed at this point is not important I'd say. I just need to get the data into a database for now, and if I try to read all elements and all values I need, python will just crash or error (and MacOS starts force quitting applications ;) ).

– Chrisvdberge
Nov 22 '18 at 12:37

1

Did you check this?

– Andersson
Nov 22 '18 at 12:41

1

Possible duplicate of xml parsing in python for big data

– stovfl
Nov 22 '18 at 13:31

those links were quite helpful. The possible duplicate pointed in the right direction but didn't provided a clear concrete answer, so I added the code I came up with as answer to this question for clarity

– Chrisvdberge
Nov 23 '18 at 12:05

add a comment |

I don't know about Python XML handling, but in the Java world there are two types: SAX and DOM. DOM means "Document Object Model" and it means that the whole XML file gets loaded into memory, so that queries can be done afterwards. Once loaded, it is very quick, but it might consume huge amounts of memory. SAX however runs over your XML, and when a certain tag, attribute, content is reached, an event might be launched. This might take quite a while but it almost takes no memory. So, when you say "efficient", do you mean speed or memory related efficiency?

– Dominique
Nov 22 '18 at 12:29

speed at this point is not important I'd say. I just need to get the data into a database for now, and if I try to read all elements and all values I need, python will just crash or error (and MacOS starts force quitting applications ;) ).

– Chrisvdberge
Nov 22 '18 at 12:37

1

Did you check this?

– Andersson
Nov 22 '18 at 12:41

1

Possible duplicate of xml parsing in python for big data

– stovfl
Nov 22 '18 at 13:31

those links were quite helpful. The possible duplicate pointed in the right direction but didn't provided a clear concrete answer, so I added the code I came up with as answer to this question for clarity

– Chrisvdberge
Nov 23 '18 at 12:05

I don't know about Python XML handling, but in the Java world there are two types: SAX and DOM. DOM means "Document Object Model" and it means that the whole XML file gets loaded into memory, so that queries can be done afterwards. Once loaded, it is very quick, but it might consume huge amounts of memory. SAX however runs over your XML, and when a certain tag, attribute, content is reached, an event might be launched. This might take quite a while but it almost takes no memory. So, when you say "efficient", do you mean speed or memory related efficiency?

– Dominique
Nov 22 '18 at 12:29

speed at this point is not important I'd say. I just need to get the data into a database for now, and if I try to read all elements and all values I need, python will just crash or error (and MacOS starts force quitting applications ;) ).

– Chrisvdberge
Nov 22 '18 at 12:37

Did you check this?

– Andersson
Nov 22 '18 at 12:41

Possible duplicate of xml parsing in python for big data

– stovfl
Nov 22 '18 at 13:31

those links were quite helpful. The possible duplicate pointed in the right direction but didn't provided a clear concrete answer, so I added the code I came up with as answer to this question for clarity

– Chrisvdberge
Nov 23 '18 at 12:05

add a comment |

1 Answer
1

active

oldest

votes

Based on the links you posted in the comments I came up with the following to iterate and split more efficiently that works fine:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'



time = 

data1_element1_x = 

data1_element1_y = 

data1_element2_x = 

data1_element2_y = 

data2_element1_x = 

data2_element1_y = 

data2_element2_x = 

data2_element2_y = 



if file.lower().endswith('.xml'):

    for event, elem in etree.iterparse(file):

        if elem.tag == "subelement":

            time.append(elem.get('tc'))

            for child in elem:

                if child.tag == "element1":

                    split_data = child.text.split(" ")

                    data1_element1_x.append(float(split_data[0]))

                    data1_element1_y.append(float(split_data[1]))

                    data2_element1_x.append(float(split_data[2]))

                    data2_element1_y.append(float(split_data[3]))

                elif child.tag == "element2":

                    split_data = child.text.split(" ")

                    data1_element2_x.append(float(split_data[0]))

                    data1_element2_y.append(float(split_data[1]))

                    data2_element2_x.append(float(split_data[2]))

                    data2_element2_y.append(float(split_data[3]))

             elem.clear()

df = pd.DataFrame({

    'Time':time, 

    'Data1_element1_x': data1_element1_x, 

    'Data1_element1_y': data1_element1_y, 

    'Data1_element2_x': data1_element2_x, 

    'Data1_element2_y': data1_element2_y, 

    'Data2_element1_x': data2_element1_x, 

    'Data2_element1_y': data2_element1_y, 

    'Data2_element2_x': data2_element2_x, 

    'Data2_element2_y': data2_element2_y

})



print(df)

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430768%2fparsing-big-xml-files-efficiently%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Based on the links you posted in the comments I came up with the following to iterate and split more efficiently that works fine:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'



time = 

data1_element1_x = 

data1_element1_y = 

data1_element2_x = 

data1_element2_y = 

data2_element1_x = 

data2_element1_y = 

data2_element2_x = 

data2_element2_y = 



if file.lower().endswith('.xml'):

    for event, elem in etree.iterparse(file):

        if elem.tag == "subelement":

            time.append(elem.get('tc'))

            for child in elem:

                if child.tag == "element1":

                    split_data = child.text.split(" ")

                    data1_element1_x.append(float(split_data[0]))

                    data1_element1_y.append(float(split_data[1]))

                    data2_element1_x.append(float(split_data[2]))

                    data2_element1_y.append(float(split_data[3]))

                elif child.tag == "element2":

                    split_data = child.text.split(" ")

                    data1_element2_x.append(float(split_data[0]))

                    data1_element2_y.append(float(split_data[1]))

                    data2_element2_x.append(float(split_data[2]))

                    data2_element2_y.append(float(split_data[3]))

             elem.clear()

df = pd.DataFrame({

    'Time':time, 

    'Data1_element1_x': data1_element1_x, 

    'Data1_element1_y': data1_element1_y, 

    'Data1_element2_x': data1_element2_x, 

    'Data1_element2_y': data1_element2_y, 

    'Data2_element1_x': data2_element1_x, 

    'Data2_element1_y': data2_element1_y, 

    'Data2_element2_x': data2_element2_x, 

    'Data2_element2_y': data2_element2_y

})



print(df)

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

add a comment |

Based on the links you posted in the comments I came up with the following to iterate and split more efficiently that works fine:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'



time = 

data1_element1_x = 

data1_element1_y = 

data1_element2_x = 

data1_element2_y = 

data2_element1_x = 

data2_element1_y = 

data2_element2_x = 

data2_element2_y = 



if file.lower().endswith('.xml'):

    for event, elem in etree.iterparse(file):

        if elem.tag == "subelement":

            time.append(elem.get('tc'))

            for child in elem:

                if child.tag == "element1":

                    split_data = child.text.split(" ")

                    data1_element1_x.append(float(split_data[0]))

                    data1_element1_y.append(float(split_data[1]))

                    data2_element1_x.append(float(split_data[2]))

                    data2_element1_y.append(float(split_data[3]))

                elif child.tag == "element2":

                    split_data = child.text.split(" ")

                    data1_element2_x.append(float(split_data[0]))

                    data1_element2_y.append(float(split_data[1]))

                    data2_element2_x.append(float(split_data[2]))

                    data2_element2_y.append(float(split_data[3]))

             elem.clear()

df = pd.DataFrame({

    'Time':time, 

    'Data1_element1_x': data1_element1_x, 

    'Data1_element1_y': data1_element1_y, 

    'Data1_element2_x': data1_element2_x, 

    'Data1_element2_y': data1_element2_y, 

    'Data2_element1_x': data2_element1_x, 

    'Data2_element1_y': data2_element1_y, 

    'Data2_element2_x': data2_element2_x, 

    'Data2_element2_y': data2_element2_y

})



print(df)

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

add a comment |

Based on the links you posted in the comments I came up with the following to iterate and split more efficiently that works fine:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'



time = 

data1_element1_x = 

data1_element1_y = 

data1_element2_x = 

data1_element2_y = 

data2_element1_x = 

data2_element1_y = 

data2_element2_x = 

data2_element2_y = 



if file.lower().endswith('.xml'):

    for event, elem in etree.iterparse(file):

        if elem.tag == "subelement":

            time.append(elem.get('tc'))

            for child in elem:

                if child.tag == "element1":

                    split_data = child.text.split(" ")

                    data1_element1_x.append(float(split_data[0]))

                    data1_element1_y.append(float(split_data[1]))

                    data2_element1_x.append(float(split_data[2]))

                    data2_element1_y.append(float(split_data[3]))

                elif child.tag == "element2":

                    split_data = child.text.split(" ")

                    data1_element2_x.append(float(split_data[0]))

                    data1_element2_y.append(float(split_data[1]))

                    data2_element2_x.append(float(split_data[2]))

                    data2_element2_y.append(float(split_data[3]))

             elem.clear()

df = pd.DataFrame({

    'Time':time, 

    'Data1_element1_x': data1_element1_x, 

    'Data1_element1_y': data1_element1_y, 

    'Data1_element2_x': data1_element2_x, 

    'Data1_element2_y': data1_element2_y, 

    'Data2_element1_x': data2_element1_x, 

    'Data2_element1_y': data2_element1_y, 

    'Data2_element2_x': data2_element2_x, 

    'Data2_element2_y': data2_element2_y

})



print(df)

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

Based on the links you posted in the comments I came up with the following to iterate and split more efficiently that works fine:

from lxml import etree, objectify

import pandas as pd



file = 'some_huge_file.xml'



time = 

data1_element1_x = 

data1_element1_y = 

data1_element2_x = 

data1_element2_y = 

data2_element1_x = 

data2_element1_y = 

data2_element2_x = 

data2_element2_y = 



if file.lower().endswith('.xml'):

    for event, elem in etree.iterparse(file):

        if elem.tag == "subelement":

            time.append(elem.get('tc'))

            for child in elem:

                if child.tag == "element1":

                    split_data = child.text.split(" ")

                    data1_element1_x.append(float(split_data[0]))

                    data1_element1_y.append(float(split_data[1]))

                    data2_element1_x.append(float(split_data[2]))

                    data2_element1_y.append(float(split_data[3]))

                elif child.tag == "element2":

                    split_data = child.text.split(" ")

                    data1_element2_x.append(float(split_data[0]))

                    data1_element2_y.append(float(split_data[1]))

                    data2_element2_x.append(float(split_data[2]))

                    data2_element2_y.append(float(split_data[3]))

             elem.clear()

df = pd.DataFrame({

    'Time':time, 

    'Data1_element1_x': data1_element1_x, 

    'Data1_element1_y': data1_element1_y, 

    'Data1_element2_x': data1_element2_x, 

    'Data1_element2_y': data1_element2_y, 

    'Data2_element1_x': data2_element1_x, 

    'Data2_element1_y': data2_element1_y, 

    'Data2_element2_x': data2_element2_x, 

    'Data2_element2_y': data2_element2_y

})



print(df)

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

answered Nov 23 '18 at 12:04

Chrisvdberge

5752822

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

56AF8Mu5Twrqf6cIOdU7C5p5ZYZrREMx0qWgYbuv1NLSXmm lRQGaUOFc3yd6HwEEwhFjQCUx kOYLClTgtUFsIsQL7H 4XXJDS

搜尋此網誌

Cfrgtkky