How do i arrange Single cardinality for Vertex properties imported via CSV into AWS Neptune?

Neptune documentation says they support "Set" property cardinality only on property data imported via CSV, which means there is no way that a newly arrived property value could overwrite the old property value on the same vertex, on the same property.

For example, if the first CSV imports

~id,~label,age

Marko,person,29

then Marko has a birthday & a second CSV imports

~id,~label,age

Marko,person,30

'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.

AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.

Does this mean that there should be a traversal that continuously scanning Vertexes with multiple (Set) properties and set the property once again with Single cardinality, with the last value possible? IF so, what is the optimal Gremlin query to do do that?

In pseudo-Gremlin i'd imagine something like:

g.V().property(single, properties(*), _.tail())

Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?

Or am i completely on the wrong track here.

Any help would be appreciated.

Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.

In Plan A if we happen to know the property names and the order of arrival does not matter at all (just want single cardinality on these props), the traversal for all vertexes could be something like:

g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )

The plan B is to collect new property values under temporary property names in the same vertex (eg. starting with _), and traverse through vertexes having such temporary property names and set original properties with their tailed values with single cardinality:

g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()

The Plan C, which would be the coolest, but unfortunately does not work, is to keep collecting property values in a dedicated vertex, with epoch timestamps as property names, and property values as their values:

g.V(${vertexid}).out('has_propnames').properties()

==>vp[1542827843->value1]

==>vp[1542827798->value2]

==>vp[1542887080->latestvalue]

and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:

g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value(  out(${has_these_properties}).properties().keys().order().tail()  ) ) )

Looks like the parameter for value() step must be constant, it can't use the outcome of another traversal as parameter, so i could not get this working. Perhaps someone with more Gremlin experience know a workaround for this.

edited Dec 18 '18 at 0:40

user10796762

asked Nov 16 '18 at 15:29

Balazs David Molnar

add a comment |

For example, if the first CSV imports

~id,~label,age

Marko,person,29

then Marko has a birthday & a second CSV imports

~id,~label,age

Marko,person,30

'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.

AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.

In pseudo-Gremlin i'd imagine something like:

g.V().property(single, properties(*), _.tail())

Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?

Or am i completely on the wrong track here.

Any help would be appreciated.

Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.

g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )

g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()

g.V(${vertexid}).out('has_propnames').properties()

==>vp[1542827843->value1]

==>vp[1542827798->value2]

==>vp[1542887080->latestvalue]

and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:

g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value(  out(${has_these_properties}).properties().keys().order().tail()  ) ) )

edited Dec 18 '18 at 0:40

user10796762

asked Nov 16 '18 at 15:29

Balazs David Molnar

add a comment |

For example, if the first CSV imports

~id,~label,age

Marko,person,29

then Marko has a birthday & a second CSV imports

~id,~label,age

Marko,person,30

'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.

AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.

In pseudo-Gremlin i'd imagine something like:

g.V().property(single, properties(*), _.tail())

Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?

Or am i completely on the wrong track here.

Any help would be appreciated.

Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.

g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )

g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()

g.V(${vertexid}).out('has_propnames').properties()

==>vp[1542827843->value1]

==>vp[1542827798->value2]

==>vp[1542887080->latestvalue]

and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:

g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value(  out(${has_these_properties}).properties().keys().order().tail()  ) ) )

edited Dec 18 '18 at 0:40

user10796762

asked Nov 16 '18 at 15:29

Balazs David Molnar

For example, if the first CSV imports

~id,~label,age

Marko,person,29

then Marko has a birthday & a second CSV imports

~id,~label,age

Marko,person,30

'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.

AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.

In pseudo-Gremlin i'd imagine something like:

g.V().property(single, properties(*), _.tail())

Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?

Or am i completely on the wrong track here.

Any help would be appreciated.

Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.

g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )

g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()

g.V(${vertexid}).out('has_propnames').properties()

==>vp[1542827843->value1]

==>vp[1542827798->value2]

==>vp[1542887080->latestvalue]

and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:

g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value(  out(${has_these_properties}).properties().keys().order().tail()  ) ) )

amazon-web-services csv gremlin cardinality amazon-neptune

edited Dec 18 '18 at 0:40

user10796762

asked Nov 16 '18 at 15:29

Balazs David Molnar

edited Dec 18 '18 at 0:40

user10796762

asked Nov 16 '18 at 15:29

Balazs David Molnar

edited Dec 18 '18 at 0:40

user10796762

edited Dec 18 '18 at 0:40

user10796762

edited Dec 18 '18 at 0:40

user10796762

asked Nov 16 '18 at 15:29

Balazs David Molnar

asked Nov 16 '18 at 15:29

Balazs David Molnar

asked Nov 16 '18 at 15:29

Balazs David Molnar

add a comment |

1 Answer
1

active

oldest

votes

It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.

So your gremlin update query would be as follows.

g.V(${id})

 .property(single,${key},${value})

In so far as whether set is a guaranteed order, I do not know. :(

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.

– Balazs David Molnar
Nov 20 '18 at 22:34

Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.

– Dave Zabriskie
Nov 21 '18 at 16:17

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53340849%2fhow-do-i-arrange-single-cardinality-for-vertex-properties-imported-via-csv-into%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

So your gremlin update query would be as follows.

g.V(${id})

 .property(single,${key},${value})

In so far as whether set is a guaranteed order, I do not know. :(

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.

– Balazs David Molnar
Nov 20 '18 at 22:34

Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.

– Dave Zabriskie
Nov 21 '18 at 16:17

add a comment |

So your gremlin update query would be as follows.

g.V(${id})

 .property(single,${key},${value})

In so far as whether set is a guaranteed order, I do not know. :(

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.

– Balazs David Molnar
Nov 20 '18 at 22:34

Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.

– Dave Zabriskie
Nov 21 '18 at 16:17

add a comment |

So your gremlin update query would be as follows.

g.V(${id})

 .property(single,${key},${value})

In so far as whether set is a guaranteed order, I do not know. :(

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

So your gremlin update query would be as follows.

g.V(${id})

 .property(single,${key},${value})

In so far as whether set is a guaranteed order, I do not know. :(

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

answered Nov 20 '18 at 18:54

Dave Zabriskie

1156

Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.

– Balazs David Molnar
Nov 20 '18 at 22:34

Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.

– Dave Zabriskie
Nov 21 '18 at 16:17

add a comment |

Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.

– Balazs David Molnar
Nov 20 '18 at 22:34

Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.

– Dave Zabriskie
Nov 21 '18 at 16:17

Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.

– Balazs David Molnar
Nov 20 '18 at 22:34

Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.

– Dave Zabriskie
Nov 21 '18 at 16:17

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky