using preg_match with html comments
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I want to convert into a string the html contained between these comments
<!--content-start-->
desired html
<!--content-end-->
so I use pregmatch, right?
preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);
but it wont work. Maybe a problem with the REGEX?
Thank you.
php preg-match
add a comment |
I want to convert into a string the html contained between these comments
<!--content-start-->
desired html
<!--content-end-->
so I use pregmatch, right?
preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);
but it wont work. Maybe a problem with the REGEX?
Thank you.
php preg-match
No, you don't use regular expressions to parse HTML. You use an HTML parser!
– miken32
Nov 23 '18 at 4:09
for example please?
– Cain Nuke
Nov 23 '18 at 4:11
@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)
– MrWhite
Dec 1 '18 at 1:28
add a comment |
I want to convert into a string the html contained between these comments
<!--content-start-->
desired html
<!--content-end-->
so I use pregmatch, right?
preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);
but it wont work. Maybe a problem with the REGEX?
Thank you.
php preg-match
I want to convert into a string the html contained between these comments
<!--content-start-->
desired html
<!--content-end-->
so I use pregmatch, right?
preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);
but it wont work. Maybe a problem with the REGEX?
Thank you.
php preg-match
php preg-match
edited Nov 23 '18 at 4:47
miken32
24.9k95173
24.9k95173
asked Nov 23 '18 at 3:58
Cain NukeCain Nuke
65611335
65611335
No, you don't use regular expressions to parse HTML. You use an HTML parser!
– miken32
Nov 23 '18 at 4:09
for example please?
– Cain Nuke
Nov 23 '18 at 4:11
@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)
– MrWhite
Dec 1 '18 at 1:28
add a comment |
No, you don't use regular expressions to parse HTML. You use an HTML parser!
– miken32
Nov 23 '18 at 4:09
for example please?
– Cain Nuke
Nov 23 '18 at 4:11
@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)
– MrWhite
Dec 1 '18 at 1:28
No, you don't use regular expressions to parse HTML. You use an HTML parser!
– miken32
Nov 23 '18 at 4:09
No, you don't use regular expressions to parse HTML. You use an HTML parser!
– miken32
Nov 23 '18 at 4:09
for example please?
– Cain Nuke
Nov 23 '18 at 4:11
for example please?
– Cain Nuke
Nov 23 '18 at 4:11
@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)
– MrWhite
Dec 1 '18 at 1:28
@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)
– MrWhite
Dec 1 '18 at 1:28
add a comment |
2 Answers
2
active
oldest
votes
Perhaps a /s modifier will help. Check the documentation:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.
Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.
– MrWhite
Dec 1 '18 at 1:19
add a comment |
Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.
$html = <<< HTML
<!--content-start-->
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
<!--content-end-->
<p>Not returning this</p>
HTML;
$return = "";
$dom = new DomDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DomXpath($dom);
$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");
foreach ($siblings as $node) {
if ($node instanceof DOMComment && $node->textContent === "content-end") {
break;
}
$return .= $dom->saveHTML($node) . "n";
}
echo $return;
Output:
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
will this work if the html is from another website?
– Cain Nuke
Nov 23 '18 at 17:17
It's HTML, it doesn't matter where it's from.
– miken32
Nov 23 '18 at 17:18
great, I will try it. Thanks
– Cain Nuke
Nov 23 '18 at 17:20
sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()
– Cain Nuke
Nov 23 '18 at 17:30
Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!
– miken32
Nov 23 '18 at 17:31
|
show 8 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440493%2fusing-preg-match-with-html-comments%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Perhaps a /s modifier will help. Check the documentation:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.
Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.
– MrWhite
Dec 1 '18 at 1:19
add a comment |
Perhaps a /s modifier will help. Check the documentation:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.
Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.
– MrWhite
Dec 1 '18 at 1:19
add a comment |
Perhaps a /s modifier will help. Check the documentation:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.
Perhaps a /s modifier will help. Check the documentation:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This modifier is equivalent to
Perl's /s modifier. A negative class such as [^a] always matches a newline character,
independent of the setting of this modifier.
answered Nov 23 '18 at 4:08
drmaddrmad
1567
1567
Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.
– MrWhite
Dec 1 '18 at 1:19
add a comment |
Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.
– MrWhite
Dec 1 '18 at 1:19
Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.
– MrWhite
Dec 1 '18 at 1:19
Yes, this is all that's required in the OPs example. Parsing the HTML (as mentioned in the other answer) is overkill IMO - for what is really just a simple pattern matching exercise.
– MrWhite
Dec 1 '18 at 1:19
add a comment |
Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.
$html = <<< HTML
<!--content-start-->
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
<!--content-end-->
<p>Not returning this</p>
HTML;
$return = "";
$dom = new DomDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DomXpath($dom);
$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");
foreach ($siblings as $node) {
if ($node instanceof DOMComment && $node->textContent === "content-end") {
break;
}
$return .= $dom->saveHTML($node) . "n";
}
echo $return;
Output:
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
will this work if the html is from another website?
– Cain Nuke
Nov 23 '18 at 17:17
It's HTML, it doesn't matter where it's from.
– miken32
Nov 23 '18 at 17:18
great, I will try it. Thanks
– Cain Nuke
Nov 23 '18 at 17:20
sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()
– Cain Nuke
Nov 23 '18 at 17:30
Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!
– miken32
Nov 23 '18 at 17:31
|
show 8 more comments
Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.
$html = <<< HTML
<!--content-start-->
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
<!--content-end-->
<p>Not returning this</p>
HTML;
$return = "";
$dom = new DomDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DomXpath($dom);
$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");
foreach ($siblings as $node) {
if ($node instanceof DOMComment && $node->textContent === "content-end") {
break;
}
$return .= $dom->saveHTML($node) . "n";
}
echo $return;
Output:
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
will this work if the html is from another website?
– Cain Nuke
Nov 23 '18 at 17:17
It's HTML, it doesn't matter where it's from.
– miken32
Nov 23 '18 at 17:18
great, I will try it. Thanks
– Cain Nuke
Nov 23 '18 at 17:20
sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()
– Cain Nuke
Nov 23 '18 at 17:30
Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!
– miken32
Nov 23 '18 at 17:31
|
show 8 more comments
Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.
$html = <<< HTML
<!--content-start-->
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
<!--content-end-->
<p>Not returning this</p>
HTML;
$return = "";
$dom = new DomDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DomXpath($dom);
$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");
foreach ($siblings as $node) {
if ($node instanceof DOMComment && $node->textContent === "content-end") {
break;
}
$return .= $dom->saveHTML($node) . "n";
}
echo $return;
Output:
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.
$html = <<< HTML
<!--content-start-->
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
<!--content-end-->
<p>Not returning this</p>
HTML;
$return = "";
$dom = new DomDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DomXpath($dom);
$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");
foreach ($siblings as $node) {
if ($node instanceof DOMComment && $node->textContent === "content-end") {
break;
}
$return .= $dom->saveHTML($node) . "n";
}
echo $return;
Output:
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
edited Nov 23 '18 at 4:50
answered Nov 23 '18 at 4:37
miken32miken32
24.9k95173
24.9k95173
will this work if the html is from another website?
– Cain Nuke
Nov 23 '18 at 17:17
It's HTML, it doesn't matter where it's from.
– miken32
Nov 23 '18 at 17:18
great, I will try it. Thanks
– Cain Nuke
Nov 23 '18 at 17:20
sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()
– Cain Nuke
Nov 23 '18 at 17:30
Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!
– miken32
Nov 23 '18 at 17:31
|
show 8 more comments
will this work if the html is from another website?
– Cain Nuke
Nov 23 '18 at 17:17
It's HTML, it doesn't matter where it's from.
– miken32
Nov 23 '18 at 17:18
great, I will try it. Thanks
– Cain Nuke
Nov 23 '18 at 17:20
sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()
– Cain Nuke
Nov 23 '18 at 17:30
Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!
– miken32
Nov 23 '18 at 17:31
will this work if the html is from another website?
– Cain Nuke
Nov 23 '18 at 17:17
will this work if the html is from another website?
– Cain Nuke
Nov 23 '18 at 17:17
It's HTML, it doesn't matter where it's from.
– miken32
Nov 23 '18 at 17:18
It's HTML, it doesn't matter where it's from.
– miken32
Nov 23 '18 at 17:18
great, I will try it. Thanks
– Cain Nuke
Nov 23 '18 at 17:20
great, I will try it. Thanks
– Cain Nuke
Nov 23 '18 at 17:20
sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()
– Cain Nuke
Nov 23 '18 at 17:30
sorry but I got these warnings: DOMDocument::loadHTML() expects exactly 1 parameter, 2 given Warning: DOMXPath::query() [domxpath.query]: Invalid or inclomplete context Warning: Invalid argument supplied for foreach()
– Cain Nuke
Nov 23 '18 at 17:30
Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!
– miken32
Nov 23 '18 at 17:31
Are you kidding me? PHP 5.3 has been EOL for 5 years now. You gotta upgrade!
– miken32
Nov 23 '18 at 17:31
|
show 8 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440493%2fusing-preg-match-with-html-comments%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
No, you don't use regular expressions to parse HTML. You use an HTML parser!
– miken32
Nov 23 '18 at 4:09
for example please?
– Cain Nuke
Nov 23 '18 at 4:11
@miken32 Although, arguably, they aren't "parsing HTML". They are simply extracting one block of text between two unique tokens (regardless of content-type). Using an HTML parser in this particular example (a simple one-off pattern matching exercise) is overkill IMO. (Only a small tweak to the OPs regex is required.)
– MrWhite
Dec 1 '18 at 1:28