How do i arrange Single cardinality for Vertex properties imported via CSV into AWS Neptune?
Neptune documentation says they support "Set" property cardinality only on property data imported via CSV, which means there is no way that a newly arrived property value could overwrite the old property value on the same vertex, on the same property.
For example, if the first CSV imports
~id,~label,age
Marko,person,29
then Marko has a birthday & a second CSV imports
~id,~label,age
Marko,person,30
'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.
AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.
Does this mean that there should be a traversal that continuously scanning Vertexes with multiple (Set) properties and set the property once again with Single cardinality, with the last value possible? IF so, what is the optimal Gremlin query to do do that?
In pseudo-Gremlin i'd imagine something like:
g.V().property(single, properties(*), _.tail())
Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?
Or am i completely on the wrong track here.
Any help would be appreciated.
Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.
In Plan A if we happen to know the property names and the order of arrival does not matter at all (just want single cardinality on these props), the traversal for all vertexes could be something like:
g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )
The plan B is to collect new property values under temporary property names in the same vertex (eg. starting with _), and traverse through vertexes having such temporary property names and set original properties with their tailed values with single cardinality:
g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()
The Plan C, which would be the coolest, but unfortunately does not work, is to keep collecting property values in a dedicated vertex, with epoch timestamps as property names, and property values as their values:
g.V(${vertexid}).out('has_propnames').properties()
==>vp[1542827843->value1]
==>vp[1542827798->value2]
==>vp[1542887080->latestvalue]
and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:
g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value( out(${has_these_properties}).properties().keys().order().tail() ) ) )
Looks like the parameter for value() step must be constant, it can't use the outcome of another traversal as parameter, so i could not get this working. Perhaps someone with more Gremlin experience know a workaround for this.
amazon-web-services csv gremlin cardinality amazon-neptune
add a comment |
Neptune documentation says they support "Set" property cardinality only on property data imported via CSV, which means there is no way that a newly arrived property value could overwrite the old property value on the same vertex, on the same property.
For example, if the first CSV imports
~id,~label,age
Marko,person,29
then Marko has a birthday & a second CSV imports
~id,~label,age
Marko,person,30
'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.
AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.
Does this mean that there should be a traversal that continuously scanning Vertexes with multiple (Set) properties and set the property once again with Single cardinality, with the last value possible? IF so, what is the optimal Gremlin query to do do that?
In pseudo-Gremlin i'd imagine something like:
g.V().property(single, properties(*), _.tail())
Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?
Or am i completely on the wrong track here.
Any help would be appreciated.
Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.
In Plan A if we happen to know the property names and the order of arrival does not matter at all (just want single cardinality on these props), the traversal for all vertexes could be something like:
g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )
The plan B is to collect new property values under temporary property names in the same vertex (eg. starting with _), and traverse through vertexes having such temporary property names and set original properties with their tailed values with single cardinality:
g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()
The Plan C, which would be the coolest, but unfortunately does not work, is to keep collecting property values in a dedicated vertex, with epoch timestamps as property names, and property values as their values:
g.V(${vertexid}).out('has_propnames').properties()
==>vp[1542827843->value1]
==>vp[1542827798->value2]
==>vp[1542887080->latestvalue]
and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:
g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value( out(${has_these_properties}).properties().keys().order().tail() ) ) )
Looks like the parameter for value() step must be constant, it can't use the outcome of another traversal as parameter, so i could not get this working. Perhaps someone with more Gremlin experience know a workaround for this.
amazon-web-services csv gremlin cardinality amazon-neptune
add a comment |
Neptune documentation says they support "Set" property cardinality only on property data imported via CSV, which means there is no way that a newly arrived property value could overwrite the old property value on the same vertex, on the same property.
For example, if the first CSV imports
~id,~label,age
Marko,person,29
then Marko has a birthday & a second CSV imports
~id,~label,age
Marko,person,30
'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.
AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.
Does this mean that there should be a traversal that continuously scanning Vertexes with multiple (Set) properties and set the property once again with Single cardinality, with the last value possible? IF so, what is the optimal Gremlin query to do do that?
In pseudo-Gremlin i'd imagine something like:
g.V().property(single, properties(*), _.tail())
Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?
Or am i completely on the wrong track here.
Any help would be appreciated.
Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.
In Plan A if we happen to know the property names and the order of arrival does not matter at all (just want single cardinality on these props), the traversal for all vertexes could be something like:
g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )
The plan B is to collect new property values under temporary property names in the same vertex (eg. starting with _), and traverse through vertexes having such temporary property names and set original properties with their tailed values with single cardinality:
g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()
The Plan C, which would be the coolest, but unfortunately does not work, is to keep collecting property values in a dedicated vertex, with epoch timestamps as property names, and property values as their values:
g.V(${vertexid}).out('has_propnames').properties()
==>vp[1542827843->value1]
==>vp[1542827798->value2]
==>vp[1542887080->latestvalue]
and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:
g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value( out(${has_these_properties}).properties().keys().order().tail() ) ) )
Looks like the parameter for value() step must be constant, it can't use the outcome of another traversal as parameter, so i could not get this working. Perhaps someone with more Gremlin experience know a workaround for this.
amazon-web-services csv gremlin cardinality amazon-neptune
Neptune documentation says they support "Set" property cardinality only on property data imported via CSV, which means there is no way that a newly arrived property value could overwrite the old property value on the same vertex, on the same property.
For example, if the first CSV imports
~id,~label,age
Marko,person,29
then Marko has a birthday & a second CSV imports
~id,~label,age
Marko,person,30
'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.
AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.
Does this mean that there should be a traversal that continuously scanning Vertexes with multiple (Set) properties and set the property once again with Single cardinality, with the last value possible? IF so, what is the optimal Gremlin query to do do that?
In pseudo-Gremlin i'd imagine something like:
g.V().property(single, properties(*), _.tail())
Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?
Or am i completely on the wrong track here.
Any help would be appreciated.
Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.
In Plan A if we happen to know the property names and the order of arrival does not matter at all (just want single cardinality on these props), the traversal for all vertexes could be something like:
g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )
The plan B is to collect new property values under temporary property names in the same vertex (eg. starting with _), and traverse through vertexes having such temporary property names and set original properties with their tailed values with single cardinality:
g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()
The Plan C, which would be the coolest, but unfortunately does not work, is to keep collecting property values in a dedicated vertex, with epoch timestamps as property names, and property values as their values:
g.V(${vertexid}).out('has_propnames').properties()
==>vp[1542827843->value1]
==>vp[1542827798->value2]
==>vp[1542887080->latestvalue]
and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:
g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value( out(${has_these_properties}).properties().keys().order().tail() ) ) )
Looks like the parameter for value() step must be constant, it can't use the outcome of another traversal as parameter, so i could not get this working. Perhaps someone with more Gremlin experience know a workaround for this.
amazon-web-services csv gremlin cardinality amazon-neptune
amazon-web-services csv gremlin cardinality amazon-neptune
edited Dec 18 '18 at 0:40
user10796762
asked Nov 16 '18 at 15:29
Balazs David MolnarBalazs David Molnar
11
11
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.
So your gremlin update query would be as follows.
g.V(${id})
.property(single,${key},${value})
In so far as whether set is a guaranteed order, I do not know. :(
Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.
– Balazs David Molnar
Nov 20 '18 at 22:34
Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.
– Dave Zabriskie
Nov 21 '18 at 16:17
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53340849%2fhow-do-i-arrange-single-cardinality-for-vertex-properties-imported-via-csv-into%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.
So your gremlin update query would be as follows.
g.V(${id})
.property(single,${key},${value})
In so far as whether set is a guaranteed order, I do not know. :(
Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.
– Balazs David Molnar
Nov 20 '18 at 22:34
Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.
– Dave Zabriskie
Nov 21 '18 at 16:17
add a comment |
It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.
So your gremlin update query would be as follows.
g.V(${id})
.property(single,${key},${value})
In so far as whether set is a guaranteed order, I do not know. :(
Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.
– Balazs David Molnar
Nov 20 '18 at 22:34
Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.
– Dave Zabriskie
Nov 21 '18 at 16:17
add a comment |
It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.
So your gremlin update query would be as follows.
g.V(${id})
.property(single,${key},${value})
In so far as whether set is a guaranteed order, I do not know. :(
It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.
So your gremlin update query would be as follows.
g.V(${id})
.property(single,${key},${value})
In so far as whether set is a guaranteed order, I do not know. :(
answered Nov 20 '18 at 18:54
Dave ZabriskieDave Zabriskie
1156
1156
Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.
– Balazs David Molnar
Nov 20 '18 at 22:34
Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.
– Dave Zabriskie
Nov 21 '18 at 16:17
add a comment |
Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.
– Balazs David Molnar
Nov 20 '18 at 22:34
Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.
– Dave Zabriskie
Nov 21 '18 at 16:17
Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.
– Balazs David Molnar
Nov 20 '18 at 22:34
Thank you for your answer! The problem is that vertexes in my setup arrive very fast, CSVs containing over 100.000 vertex arrive in each minute (and get processed in 2-3 seconds, so that works amazingly fast) and that's only the beginning. On the other hand i see gremlin queries complete in 10-1000ms range so i'm afraid if i started to send a property update gremlin query for each vertex by their id's one by one in that volume, i'd probably have massive backlog in no time.
– Balazs David Molnar
Nov 20 '18 at 22:34
Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.
– Dave Zabriskie
Nov 21 '18 at 16:17
Yes, it might not keep up without some further optimization. You would think that since they allow a distinction between single and array types in the bulk load headers that it would factor into Single vs Set. Maybe in a newer version if enough people request it.
– Dave Zabriskie
Nov 21 '18 at 16:17
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53340849%2fhow-do-i-arrange-single-cardinality-for-vertex-properties-imported-via-csv-into%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown