using preg_match with html comments

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I want to convert into a string the html contained between these comments

<!--content-start-->

 desired html

<!--content-end-->

so I use pregmatch, right?

preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);

but it wont work. Maybe a problem with the REGEX?

Thank you.

edited Nov 23 '18 at 4:47

miken32

24.9k95173

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

No, you don't use regular expressions to parse HTML. You use an HTML parser!

– miken32
Nov 23 '18 at 4:09

for example please?

– Cain Nuke
Nov 23 '18 at 4:11

@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)

– MrWhite
Dec 1 '18 at 1:28

add a comment |

I want to convert into a string the html contained between these comments

<!--content-start-->

 desired html

<!--content-end-->

so I use pregmatch, right?

preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);

but it wont work. Maybe a problem with the REGEX?

Thank you.

edited Nov 23 '18 at 4:47

miken32

24.9k95173

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

No, you don't use regular expressions to parse HTML. You use an HTML parser!

– miken32
Nov 23 '18 at 4:09

for example please?

– Cain Nuke
Nov 23 '18 at 4:11

@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)

– MrWhite
Dec 1 '18 at 1:28

add a comment |

I want to convert into a string the html contained between these comments

<!--content-start-->

 desired html

<!--content-end-->

so I use pregmatch, right?

preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);

but it wont work. Maybe a problem with the REGEX?

Thank you.

edited Nov 23 '18 at 4:47

miken32

24.9k95173

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

I want to convert into a string the html contained between these comments

<!--content-start-->

 desired html

<!--content-end-->

so I use pregmatch, right?

preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);

but it wont work. Maybe a problem with the REGEX?

Thank you.

php preg-match

edited Nov 23 '18 at 4:47

miken32

24.9k95173

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

edited Nov 23 '18 at 4:47

miken32

24.9k95173

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

edited Nov 23 '18 at 4:47

miken32

24.9k95173

edited Nov 23 '18 at 4:47

miken32

24.9k95173

edited Nov 23 '18 at 4:47

miken32

24.9k95173

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

asked Nov 23 '18 at 3:58

Cain Nuke

65611335

No, you don't use regular expressions to parse HTML. You use an HTML parser!

– miken32
Nov 23 '18 at 4:09

for example please?

– Cain Nuke
Nov 23 '18 at 4:11

@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)

– MrWhite
Dec 1 '18 at 1:28

add a comment |

No, you don't use regular expressions to parse HTML. You use an HTML parser!

– miken32
Nov 23 '18 at 4:09

for example please?

– Cain Nuke
Nov 23 '18 at 4:11

@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)

– MrWhite
Dec 1 '18 at 1:28

No, you don't use regular expressions to parse HTML. You use an HTML parser!

– miken32
Nov 23 '18 at 4:09

for example please?

– Cain Nuke
Nov 23 '18 at 4:11

@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)

– MrWhite
Dec 1 '18 at 1:28

add a comment |

2 Answers
2

active

oldest

votes

Perhaps a /s modifier will help. Check the documentation:

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.

answered Nov 23 '18 at 4:08

drmad

1567

Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.

– MrWhite
Dec 1 '18 at 1:19

add a comment |

Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.

$html = <<< HTML

<!--content-start-->

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

<!--content-end-->

<p>Not returning this</p>

HTML;

$return = "";

$dom = new DomDocument;

$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

$xpath = new DomXpath($dom);

$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");

foreach ($siblings as $node) {

    if ($node instanceof DOMComment && $node->textContent === "content-end") {

        break;

    }

    $return .= $dom->saveHTML($node) . "n";

}

echo $return;

Output:

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

edited Nov 23 '18 at 4:50

answered Nov 23 '18 at 4:37

miken32

24.9k95173

will this work if the html is from another website?

– Cain Nuke
Nov 23 '18 at 17:17

It's HTML, it doesn't matter where it's from.

– miken32
Nov 23 '18 at 17:18

great, I will try it. Thanks

– Cain Nuke
Nov 23 '18 at 17:20

sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()

– Cain Nuke
Nov 23 '18 at 17:30

Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!

– miken32
Nov 23 '18 at 17:31

|
show 8 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440493%2fusing-preg-match-with-html-comments%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Perhaps a /s modifier will help. Check the documentation:

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.

answered Nov 23 '18 at 4:08

drmad

1567

Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.

– MrWhite
Dec 1 '18 at 1:19

add a comment |

Perhaps a /s modifier will help. Check the documentation:

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.

answered Nov 23 '18 at 4:08

drmad

1567

Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.

– MrWhite
Dec 1 '18 at 1:19

add a comment |

Perhaps a /s modifier will help. Check the documentation:

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.

answered Nov 23 '18 at 4:08

drmad

1567

Perhaps a /s modifier will help. Check the documentation:

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.

answered Nov 23 '18 at 4:08

drmad

1567

answered Nov 23 '18 at 4:08

drmad

1567

answered Nov 23 '18 at 4:08

drmad

1567

answered Nov 23 '18 at 4:08

drmad

1567

Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.

– MrWhite
Dec 1 '18 at 1:19

add a comment |

Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.

– MrWhite
Dec 1 '18 at 1:19

Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.

– MrWhite
Dec 1 '18 at 1:19

add a comment |

Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.

$html = <<< HTML

<!--content-start-->

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

<!--content-end-->

<p>Not returning this</p>

HTML;

$return = "";

$dom = new DomDocument;

$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

$xpath = new DomXpath($dom);

$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");

foreach ($siblings as $node) {

    if ($node instanceof DOMComment && $node->textContent === "content-end") {

        break;

    }

    $return .= $dom->saveHTML($node) . "n";

}

echo $return;

Output:

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

edited Nov 23 '18 at 4:50

answered Nov 23 '18 at 4:37

miken32

24.9k95173

will this work if the html is from another website?

– Cain Nuke
Nov 23 '18 at 17:17

It's HTML, it doesn't matter where it's from.

– miken32
Nov 23 '18 at 17:18

great, I will try it. Thanks

– Cain Nuke
Nov 23 '18 at 17:20

sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()

– Cain Nuke
Nov 23 '18 at 17:30

Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!

– miken32
Nov 23 '18 at 17:31

|
show 8 more comments

Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.

$html = <<< HTML

<!--content-start-->

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

<!--content-end-->

<p>Not returning this</p>

HTML;

$return = "";

$dom = new DomDocument;

$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

$xpath = new DomXpath($dom);

$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");

foreach ($siblings as $node) {

    if ($node instanceof DOMComment && $node->textContent === "content-end") {

        break;

    }

    $return .= $dom->saveHTML($node) . "n";

}

echo $return;

Output:

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

edited Nov 23 '18 at 4:50

answered Nov 23 '18 at 4:37

miken32

24.9k95173

will this work if the html is from another website?

– Cain Nuke
Nov 23 '18 at 17:17

It's HTML, it doesn't matter where it's from.

– miken32
Nov 23 '18 at 17:18

great, I will try it. Thanks

– Cain Nuke
Nov 23 '18 at 17:20

sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()

– Cain Nuke
Nov 23 '18 at 17:30

Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!

– miken32
Nov 23 '18 at 17:31

|
show 8 more comments

Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.

$html = <<< HTML

<!--content-start-->

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

<!--content-end-->

<p>Not returning this</p>

HTML;

$return = "";

$dom = new DomDocument;

$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

$xpath = new DomXpath($dom);

$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");

foreach ($siblings as $node) {

    if ($node instanceof DOMComment && $node->textContent === "content-end") {

        break;

    }

    $return .= $dom->saveHTML($node) . "n";

}

echo $return;

Output:

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

edited Nov 23 '18 at 4:50

answered Nov 23 '18 at 4:37

miken32

24.9k95173

Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.

$html = <<< HTML

<!--content-start-->

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

<!--content-end-->

<p>Not returning this</p>

HTML;

$return = "";

$dom = new DomDocument;

$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

$xpath = new DomXpath($dom);

$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");

foreach ($siblings as $node) {

    if ($node instanceof DOMComment && $node->textContent === "content-end") {

        break;

    }

    $return .= $dom->saveHTML($node) . "n";

}

echo $return;

Output:

<p>Here is my <i>desired html</i></p>

<!-- a comment -->

<div class="foo">Here is more</div>

edited Nov 23 '18 at 4:50

answered Nov 23 '18 at 4:37

miken32

24.9k95173

edited Nov 23 '18 at 4:50

answered Nov 23 '18 at 4:37

miken32

24.9k95173

answered Nov 23 '18 at 4:37

miken32

24.9k95173

answered Nov 23 '18 at 4:37

miken32

24.9k95173

will this work if the html is from another website?

– Cain Nuke
Nov 23 '18 at 17:17

It's HTML, it doesn't matter where it's from.

– miken32
Nov 23 '18 at 17:18

great, I will try it. Thanks

– Cain Nuke
Nov 23 '18 at 17:20

sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()

– Cain Nuke
Nov 23 '18 at 17:30

Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!

– miken32
Nov 23 '18 at 17:31

|
show 8 more comments

will this work if the html is from another website?

– Cain Nuke
Nov 23 '18 at 17:17

It's HTML, it doesn't matter where it's from.

– miken32
Nov 23 '18 at 17:18

great, I will try it. Thanks

– Cain Nuke
Nov 23 '18 at 17:20

sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()

– Cain Nuke
Nov 23 '18 at 17:30

Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!

– miken32
Nov 23 '18 at 17:31

will this work if the html is from another website?

– Cain Nuke
Nov 23 '18 at 17:17

It's HTML, it doesn't matter where it's from.

– miken32
Nov 23 '18 at 17:18

great, I will try it. Thanks

– Cain Nuke
Nov 23 '18 at 17:20

sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()

– Cain Nuke
Nov 23 '18 at 17:30

Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!

– miken32
Nov 23 '18 at 17:31

|
show 8 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky