How to combine strings from JSON values, keeping only part of the string?

up vote
2
down vote

favorite

I have sample:

           "name": "The title of website",

           "sync_transaction_version": "1",

           "type": "url",

           "url": "https://url_of_website"

I want to get the following output:

"The title of website"    url_of_website

I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.

A valid json object that I'm trying to parse is Bookmark of google chrome , sample:

{

   "checksum": "9e44bb7b76d8c39c45420dd2158a4521",

   "roots": {

      "bookmark_bar": {

         "children": [ {

            "children": [ {

               "date_added": "13161269379464568",

               "id": "2046",

               "name": "The title is here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://the_url_is_here"

            }, {

               "date_added": "13161324436994183",

               "id": "2047",

               "meta_info": {

                  "last_visited_desktop": "13176472235950821"

               },

               "name": "The title here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://url_here"

            } ]

            } ]

        }

    }

}

edited Nov 29 at 20:20

MatthewRock

3,82321847

asked Nov 29 at 14:48

Tuyen Pham

537113

3

Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49

4

You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51

@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52

@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53

add a comment |

up vote
2
down vote

favorite

I have sample:

           "name": "The title of website",

           "sync_transaction_version": "1",

           "type": "url",

           "url": "https://url_of_website"

I want to get the following output:

"The title of website"    url_of_website

A valid json object that I'm trying to parse is Bookmark of google chrome , sample:

{

   "checksum": "9e44bb7b76d8c39c45420dd2158a4521",

   "roots": {

      "bookmark_bar": {

         "children": [ {

            "children": [ {

               "date_added": "13161269379464568",

               "id": "2046",

               "name": "The title is here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://the_url_is_here"

            }, {

               "date_added": "13161324436994183",

               "id": "2047",

               "meta_info": {

                  "last_visited_desktop": "13176472235950821"

               },

               "name": "The title here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://url_here"

            } ]

            } ]

        }

    }

}

edited Nov 29 at 20:20

MatthewRock

3,82321847

asked Nov 29 at 14:48

Tuyen Pham

537113

3

Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49

4

You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51

@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52

@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53

add a comment |

up vote
2
down vote

favorite

I have sample:

           "name": "The title of website",

           "sync_transaction_version": "1",

           "type": "url",

           "url": "https://url_of_website"

I want to get the following output:

"The title of website"    url_of_website

A valid json object that I'm trying to parse is Bookmark of google chrome , sample:

{

   "checksum": "9e44bb7b76d8c39c45420dd2158a4521",

   "roots": {

      "bookmark_bar": {

         "children": [ {

            "children": [ {

               "date_added": "13161269379464568",

               "id": "2046",

               "name": "The title is here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://the_url_is_here"

            }, {

               "date_added": "13161324436994183",

               "id": "2047",

               "meta_info": {

                  "last_visited_desktop": "13176472235950821"

               },

               "name": "The title here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://url_here"

            } ]

            } ]

        }

    }

}

edited Nov 29 at 20:20

MatthewRock

3,82321847

asked Nov 29 at 14:48

Tuyen Pham

537113

I have sample:

           "name": "The title of website",

           "sync_transaction_version": "1",

           "type": "url",

           "url": "https://url_of_website"

I want to get the following output:

"The title of website"    url_of_website

A valid json object that I'm trying to parse is Bookmark of google chrome , sample:

{

   "checksum": "9e44bb7b76d8c39c45420dd2158a4521",

   "roots": {

      "bookmark_bar": {

         "children": [ {

            "children": [ {

               "date_added": "13161269379464568",

               "id": "2046",

               "name": "The title is here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://the_url_is_here"

            }, {

               "date_added": "13161324436994183",

               "id": "2047",

               "meta_info": {

                  "last_visited_desktop": "13176472235950821"

               },

               "name": "The title here",

               "sync_transaction_version": "1",

               "type": "url",

               "url": "https://url_here"

            } ]

            } ]

        }

    }

}

text-processing sed json filter

edited Nov 29 at 20:20

MatthewRock

3,82321847

asked Nov 29 at 14:48

Tuyen Pham

537113

edited Nov 29 at 20:20

MatthewRock

3,82321847

asked Nov 29 at 14:48

Tuyen Pham

537113

edited Nov 29 at 20:20

MatthewRock

3,82321847

edited Nov 29 at 20:20

MatthewRock

3,82321847

edited Nov 29 at 20:20

MatthewRock

3,82321847

asked Nov 29 at 14:48

Tuyen Pham

537113

asked Nov 29 at 14:48

Tuyen Pham

537113

asked Nov 29 at 14:48

Tuyen Pham

537113

3

Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49

4

You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51

@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52

@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53

add a comment |

3

Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49

4

You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51

@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52

@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53

Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49

You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51

@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52

@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53

add a comment |

1 Answer
1

active

oldest

votes

up vote
8
down vote

accepted

This works on the JSON document given in the question:

$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json

"The title is here"     https://the_url_is_here

"The title here"        https://url_here

This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).

If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].

To trim the https:// off from the URLs, use

.url|ltrimstr("https://")

instead of just .url.

edited Nov 29 at 15:22

answered Nov 29 at 15:03

Kusalananda

118k16223364

Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08

So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17

1

@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20

@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20

How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26

|
show 1 more comment

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484933%2fhow-to-combine-strings-from-json-values-keeping-only-part-of-the-string%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
8
down vote

accepted

This works on the JSON document given in the question:

$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json

"The title is here"     https://the_url_is_here

"The title here"        https://url_here

If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].

To trim the https:// off from the URLs, use

.url|ltrimstr("https://")

instead of just .url.

edited Nov 29 at 15:22

answered Nov 29 at 15:03

Kusalananda

118k16223364

Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08

So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17

1

@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20

@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20

How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26

|
show 1 more comment

up vote
8
down vote

accepted

This works on the JSON document given in the question:

$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json

"The title is here"     https://the_url_is_here

"The title here"        https://url_here

If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].

To trim the https:// off from the URLs, use

.url|ltrimstr("https://")

instead of just .url.

edited Nov 29 at 15:22

answered Nov 29 at 15:03

Kusalananda

118k16223364

Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08

So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17

1

@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20

@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20

How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26

|
show 1 more comment

up vote
8
down vote

accepted

This works on the JSON document given in the question:

$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json

"The title is here"     https://the_url_is_here

"The title here"        https://url_here

If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].

To trim the https:// off from the URLs, use

.url|ltrimstr("https://")

instead of just .url.

edited Nov 29 at 15:22

answered Nov 29 at 15:03

Kusalananda

118k16223364

This works on the JSON document given in the question:

$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json

"The title is here"     https://the_url_is_here

"The title here"        https://url_here

If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].

To trim the https:// off from the URLs, use

.url|ltrimstr("https://")

instead of just .url.

edited Nov 29 at 15:22

answered Nov 29 at 15:03

Kusalananda

118k16223364

edited Nov 29 at 15:22

answered Nov 29 at 15:03

Kusalananda

118k16223364

answered Nov 29 at 15:03

Kusalananda

118k16223364

answered Nov 29 at 15:03

Kusalananda

118k16223364

Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08

So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17

1

@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20

@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20

How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26

|
show 1 more comment

Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08

So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17

1

@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20

@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20

How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26

Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08

So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17

@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20

@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20

How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky