How to combine strings from JSON values, keeping only part of the string?
up vote
2
down vote
favorite
I have sample:
"name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"
I want to get the following output:
"The title of website" url_of_website
I need to remove the protocol prefix from the URL, so that only url_of_website
is left (and no http
in the front).
Problem is I'm not quite familiar with sed
reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.
A valid json object that I'm trying to parse is Bookmark
of google chrome , sample:
{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}
text-processing sed json filter
add a comment |
up vote
2
down vote
favorite
I have sample:
"name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"
I want to get the following output:
"The title of website" url_of_website
I need to remove the protocol prefix from the URL, so that only url_of_website
is left (and no http
in the front).
Problem is I'm not quite familiar with sed
reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.
A valid json object that I'm trying to parse is Bookmark
of google chrome , sample:
{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}
text-processing sed json filter
3
Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49
4
You don't parse JSON withsed
. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it withsed
would require you to implement a JSON parser insed
that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51
@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52
@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have sample:
"name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"
I want to get the following output:
"The title of website" url_of_website
I need to remove the protocol prefix from the URL, so that only url_of_website
is left (and no http
in the front).
Problem is I'm not quite familiar with sed
reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.
A valid json object that I'm trying to parse is Bookmark
of google chrome , sample:
{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}
text-processing sed json filter
I have sample:
"name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"
I want to get the following output:
"The title of website" url_of_website
I need to remove the protocol prefix from the URL, so that only url_of_website
is left (and no http
in the front).
Problem is I'm not quite familiar with sed
reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.
A valid json object that I'm trying to parse is Bookmark
of google chrome , sample:
{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}
text-processing sed json filter
text-processing sed json filter
edited Nov 29 at 20:20
MatthewRock
3,82321847
3,82321847
asked Nov 29 at 14:48
Tuyen Pham
537113
537113
3
Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49
4
You don't parse JSON withsed
. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it withsed
would require you to implement a JSON parser insed
that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51
@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52
@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53
add a comment |
3
Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49
4
You don't parse JSON withsed
. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it withsed
would require you to implement a JSON parser insed
that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51
@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52
@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53
3
3
Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49
Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49
4
4
You don't parse JSON with
sed
. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed
would require you to implement a JSON parser in sed
that could handle the different entity encoding etc. that could be present in the data (especially in URLs).– Kusalananda
Nov 29 at 14:51
You don't parse JSON with
sed
. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed
would require you to implement a JSON parser in sed
that could handle the different entity encoding etc. that could be present in the data (especially in URLs).– Kusalananda
Nov 29 at 14:51
@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52
@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52
@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53
@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53
add a comment |
1 Answer
1
active
oldest
votes
up vote
8
down vote
accepted
This works on the JSON document given in the question:
$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here
This accesses the .children
array of each .roots.bookmark_bar.children
array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).
If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url]
to just [.name,.url]
.
To trim the https://
off from the URLs, use
.url|ltrimstr("https://")
instead of just .url
.
Thanks, at the end of the file I get this errror:jq: error (at Bookmarks:23397): Cannot iterate over null (null)
, 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08
So I've just modified your command, the correct one should be:jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv'
that eliminate the above error. One more question, Is thatspace
ortab
between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17
1
@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use@csv
to get output like"The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20
@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The@tsv
command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20
How to trim bothhttp://
andhttps://
?
– Tuyen Pham
Nov 29 at 15:26
|
show 1 more comment
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
8
down vote
accepted
This works on the JSON document given in the question:
$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here
This accesses the .children
array of each .roots.bookmark_bar.children
array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).
If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url]
to just [.name,.url]
.
To trim the https://
off from the URLs, use
.url|ltrimstr("https://")
instead of just .url
.
Thanks, at the end of the file I get this errror:jq: error (at Bookmarks:23397): Cannot iterate over null (null)
, 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08
So I've just modified your command, the correct one should be:jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv'
that eliminate the above error. One more question, Is thatspace
ortab
between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17
1
@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use@csv
to get output like"The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20
@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The@tsv
command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20
How to trim bothhttp://
andhttps://
?
– Tuyen Pham
Nov 29 at 15:26
|
show 1 more comment
up vote
8
down vote
accepted
This works on the JSON document given in the question:
$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here
This accesses the .children
array of each .roots.bookmark_bar.children
array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).
If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url]
to just [.name,.url]
.
To trim the https://
off from the URLs, use
.url|ltrimstr("https://")
instead of just .url
.
Thanks, at the end of the file I get this errror:jq: error (at Bookmarks:23397): Cannot iterate over null (null)
, 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08
So I've just modified your command, the correct one should be:jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv'
that eliminate the above error. One more question, Is thatspace
ortab
between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17
1
@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use@csv
to get output like"The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20
@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The@tsv
command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20
How to trim bothhttp://
andhttps://
?
– Tuyen Pham
Nov 29 at 15:26
|
show 1 more comment
up vote
8
down vote
accepted
up vote
8
down vote
accepted
This works on the JSON document given in the question:
$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here
This accesses the .children
array of each .roots.bookmark_bar.children
array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).
If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url]
to just [.name,.url]
.
To trim the https://
off from the URLs, use
.url|ltrimstr("https://")
instead of just .url
.
This works on the JSON document given in the question:
$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here
This accesses the .children
array of each .roots.bookmark_bar.children
array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).
If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url]
to just [.name,.url]
.
To trim the https://
off from the URLs, use
.url|ltrimstr("https://")
instead of just .url
.
edited Nov 29 at 15:22
answered Nov 29 at 15:03
Kusalananda
118k16223364
118k16223364
Thanks, at the end of the file I get this errror:jq: error (at Bookmarks:23397): Cannot iterate over null (null)
, 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08
So I've just modified your command, the correct one should be:jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv'
that eliminate the above error. One more question, Is thatspace
ortab
between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17
1
@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use@csv
to get output like"The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20
@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The@tsv
command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20
How to trim bothhttp://
andhttps://
?
– Tuyen Pham
Nov 29 at 15:26
|
show 1 more comment
Thanks, at the end of the file I get this errror:jq: error (at Bookmarks:23397): Cannot iterate over null (null)
, 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08
So I've just modified your command, the correct one should be:jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv'
that eliminate the above error. One more question, Is thatspace
ortab
between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17
1
@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use@csv
to get output like"The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20
@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The@tsv
command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20
How to trim bothhttp://
andhttps://
?
– Tuyen Pham
Nov 29 at 15:26
Thanks, at the end of the file I get this errror:
jq: error (at Bookmarks:23397): Cannot iterate over null (null)
, 23397 is the last line of the file.– Tuyen Pham
Nov 29 at 15:08
Thanks, at the end of the file I get this errror:
jq: error (at Bookmarks:23397): Cannot iterate over null (null)
, 23397 is the last line of the file.– Tuyen Pham
Nov 29 at 15:08
So I've just modified your command, the correct one should be:
jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv'
that eliminate the above error. One more question, Is that space
or tab
between title and url? What if I need to insert tab between them?– Tuyen Pham
Nov 29 at 15:17
So I've just modified your command, the correct one should be:
jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv'
that eliminate the above error. One more question, Is that space
or tab
between title and url? What if I need to insert tab between them?– Tuyen Pham
Nov 29 at 15:17
1
1
@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use
@csv
to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20
@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use
@csv
to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20
@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The
@tsv
command formats the array that it gets as a tab-delimited string.– Kusalananda
Nov 29 at 15:20
@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The
@tsv
command formats the array that it gets as a tab-delimited string.– Kusalananda
Nov 29 at 15:20
How to trim both
http://
and https://
?– Tuyen Pham
Nov 29 at 15:26
How to trim both
http://
and https://
?– Tuyen Pham
Nov 29 at 15:26
|
show 1 more comment
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484933%2fhow-to-combine-strings-from-json-values-keeping-only-part-of-the-string%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49
4
You don't parse JSON with
sed
. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it withsed
would require you to implement a JSON parser insed
that could handle the different entity encoding etc. that could be present in the data (especially in URLs).– Kusalananda
Nov 29 at 14:51
@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52
@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53