Scale vector in scaled pivoting (numerical methods)
$begingroup$
In the scaled pivoting version of Gaussian elimination, you exchange rows/columns not only based on the largest element to be found, but rather the largest relative to the entries in its row.
You first calculate the scaling vector $S$ (each element in $S$ is the largest absolute value in its respective row), then start choosing the pivot elements for each iteration based on this same original $S$. I understand that $S$ isn't recalculated in each iteration because "it's not worth it" - meaning the proportions generally stay the same throughout all iterations.
From Numerical Mathematics and Computing:
Notice that the scale factors are not changed after each pivot step. Intuitively, one might think that after each step in the Gaussian algorithm, the remaining (modified) coefficients should be used to recompute the scale factors instead of using the original scale vector. Of course, this could be done, but it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems.
I'm looking for a more formal explanation of this - how can I actually see that the amount of computations involved in recalculating the remaining $S$ elements in each iteration is large compared to the increased accuracy that we will get.
(Maybe some upper bound on the accuracy, compared to the number of recalculations of $S$?)
numerical-methods gaussian-elimination
$endgroup$
add a comment |
$begingroup$
In the scaled pivoting version of Gaussian elimination, you exchange rows/columns not only based on the largest element to be found, but rather the largest relative to the entries in its row.
You first calculate the scaling vector $S$ (each element in $S$ is the largest absolute value in its respective row), then start choosing the pivot elements for each iteration based on this same original $S$. I understand that $S$ isn't recalculated in each iteration because "it's not worth it" - meaning the proportions generally stay the same throughout all iterations.
From Numerical Mathematics and Computing:
Notice that the scale factors are not changed after each pivot step. Intuitively, one might think that after each step in the Gaussian algorithm, the remaining (modified) coefficients should be used to recompute the scale factors instead of using the original scale vector. Of course, this could be done, but it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems.
I'm looking for a more formal explanation of this - how can I actually see that the amount of computations involved in recalculating the remaining $S$ elements in each iteration is large compared to the increased accuracy that we will get.
(Maybe some upper bound on the accuracy, compared to the number of recalculations of $S$?)
numerical-methods gaussian-elimination
$endgroup$
add a comment |
$begingroup$
In the scaled pivoting version of Gaussian elimination, you exchange rows/columns not only based on the largest element to be found, but rather the largest relative to the entries in its row.
You first calculate the scaling vector $S$ (each element in $S$ is the largest absolute value in its respective row), then start choosing the pivot elements for each iteration based on this same original $S$. I understand that $S$ isn't recalculated in each iteration because "it's not worth it" - meaning the proportions generally stay the same throughout all iterations.
From Numerical Mathematics and Computing:
Notice that the scale factors are not changed after each pivot step. Intuitively, one might think that after each step in the Gaussian algorithm, the remaining (modified) coefficients should be used to recompute the scale factors instead of using the original scale vector. Of course, this could be done, but it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems.
I'm looking for a more formal explanation of this - how can I actually see that the amount of computations involved in recalculating the remaining $S$ elements in each iteration is large compared to the increased accuracy that we will get.
(Maybe some upper bound on the accuracy, compared to the number of recalculations of $S$?)
numerical-methods gaussian-elimination
$endgroup$
In the scaled pivoting version of Gaussian elimination, you exchange rows/columns not only based on the largest element to be found, but rather the largest relative to the entries in its row.
You first calculate the scaling vector $S$ (each element in $S$ is the largest absolute value in its respective row), then start choosing the pivot elements for each iteration based on this same original $S$. I understand that $S$ isn't recalculated in each iteration because "it's not worth it" - meaning the proportions generally stay the same throughout all iterations.
From Numerical Mathematics and Computing:
Notice that the scale factors are not changed after each pivot step. Intuitively, one might think that after each step in the Gaussian algorithm, the remaining (modified) coefficients should be used to recompute the scale factors instead of using the original scale vector. Of course, this could be done, but it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems.
I'm looking for a more formal explanation of this - how can I actually see that the amount of computations involved in recalculating the remaining $S$ elements in each iteration is large compared to the increased accuracy that we will get.
(Maybe some upper bound on the accuracy, compared to the number of recalculations of $S$?)
numerical-methods gaussian-elimination
numerical-methods gaussian-elimination
asked May 27 '15 at 15:11
CauthonCauthon
209111
209111
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
If Numerical Analysis says that something is not worth it and that it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems, then all of your alarm bells should go off. How much worth is a "belief" in mathematics in the first place? And how would such a "majority" of linear systems be defined? So I wouldn't be surprised if the real reason for scaled pivoting is ease of calculations and that's it.
That being said, instead of heading for pivoting strategies, what I'm going to advertise here is rather to avoid any form of pivoting.
The method to be employed for that purpose is called Least Squares. Instead of solving the original system
$A x = b$ , we first square the equations and then solve the system $S x = A^T A x = A^T b$ instead. There are
a number of advantages and disadvantages with this method. A disadvantage is that the equations must first
be squared, at cost of computation time. But there are dumb and there are smart methods to do the squaring.
Another disadvantage is that the conditioning of the equations becomes worse, namely proportional to
the square of the original conditioning. The latter phenomenon may be cured by first optimizing the condition
of the original the equations, which can be done by norming them, row by row. Meaning that the coefficients
$a_{ki}$ of each row $(k)$ are divided by the square root of the sum of the squared coefficients of that row:
$$
a_{ki} := frac{a_{ki}}{sqrt{sum_i a_{ki}^2}}
$$
Let's talk about the advantages now. It's easy to see that the equations have become symmetric, which
definitely is an advantage. (Note that if the symmetry of the original system is destroyed by norming
then it is restored again by squaring) There is no pivoting needed, because:
$$
S_{ij} = S_{ji} = sum_k a_{ki} a_{kj} quad Longrightarrow quad S_{ii} = sum_k a_{ki}^2
$$
And this sum is only zero if all coefficients in a column are zero, which is certainly not the case with a
non-singular matrix. Thus all pivots are on the main diagonal. Furthermore:
$$
vec{a}_i = begin{bmatrix} a_{1i} \ a_{2i} \ cdots \ a_{Ni} end{bmatrix}
quad ; quad
vec{a}_j = begin{bmatrix} a_{1j} \ a_{2j} \ cdots \ a_{Nj} end{bmatrix}
qquad Longrightarrow qquad left{ begin{matrix}
left(vec{a_i}cdotvec{a_j}right) = sum_k a_{ki} a_{kj} = S_{ij} \
left(vec{a_i}cdotvec{a_i}right) = sum_k a_{ki} a_{ki} = S_{ii} \
left(vec{a_j}cdotvec{a_j}right) = sum_k a_{kj} a_{kj} = S_{jj} end{matrix} right.
$$
Schwarz inequality:
$$
left(vec{a_i}cdotvec{a_j}right)^2 le
left(vec{a_i}cdotvec{a_i}right)left(vec{a_j}cdotvec{a_j}right)
qquad Longleftrightarrow qquad
S_{ij}^2 le S_{ii} S_{jj}
$$
With the AM-GM inequality we find from here:
$$
frac{S_{ii}+S_{jj}}{2} ge sqrt{S_{ii}S_{jj}} ge left| S_{ij} right|
$$
With other words: the (arithmetic/geometric) mean of two main diagonal elements is positive
and always greater than the absolute value of the corresponding off-diagonal element.
This is consistent with the fact that all pivots are on the main diagonal.
Further properties are that for an orthogonal matrix $A$ the squared equations become the
identity: $S = A^T A = I$ and the sum of all main diagonal matrix elements equals the order of
the system: $sum_k S_{kk} = N$.
Simple $2 times 2$ example:
$$
left{ begin{matrix} x_2 = 1 \ 2 x_1 - x_2 = 1 end{matrix} right.
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
$$
Standard Gaussian elimination will fail because there is a zero in the first row and first column.
Let the equations be normed in the first place:
$$
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
$$
Least squares equivalent (calculating the square roots can always be avoided):
$$
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
qquad Longleftrightarrow \
begin{bmatrix} 4/5 & -2/5 \ -2/5 & 6/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 4/5 end{bmatrix}
$$
Gaussian elimination with the first row as a pivot:
$$
begin{bmatrix} 4/5 & -2/5 \ 0 & 5/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 5/5 end{bmatrix}
$$
Followed by backsubstitution, as usual.
EDIT. More information about Condition Numbers available
at the personal website MSE link.
$endgroup$
$begingroup$
Thanks for your suggestion of an alternative method. I have learned a little about Least Squared, but this is a great explanation. However, I am still interested in my original question. I am assuming that there's some kind of analysis that demonstrates why $S$ isn't recalculated, aside for east of calculations, but I still cannot find any such thing.
$endgroup$
– Cauthon
Jun 5 '15 at 11:28
$begingroup$
@Cauthon: Nobody has been able to claim the bounty, right? Alas, sometimes I only have to wait in order to be sure that my judgment has not been wrong.
$endgroup$
– Han de Bruijn
Jun 10 '15 at 13:36
$begingroup$
Nope, no answer out there :(
$endgroup$
– Cauthon
Jun 10 '15 at 19:51
add a comment |
$begingroup$
The cited passage does not suggest that you should not perform the same (row) permutation also in the vector of the scaling coefficients. You need to do that.
However, recall that reducing the pivot and the associated column and/or row (depending on the choice of the algorithm) changes the rest of the matrix. So while you might have collected per-row maxima, these will no longer be true after this step. So if you were to be correct, you would need to recompute the scaling vector.
The scaling vector needs to hold maxima for all elements of all of the remaining rows (not only the unreduced portions of the remaining rows). That amounts to $O(n^2)$ in the first step, $O(n cdot (n - 1))$ in the second step and $O(n)$ in the second last step. This sums up to roughly $O(n^3/2)$ which is comparable to the $O(n^3)$ cost of the whole elimination, so it is deemed too costly to be worthwhile. Also the numerical experiments mostly show that the advantage in reducing the error is not worth it.
The upper bound on accuracy depends a lot on the matrix at hand. In any case, for some problems you want to turn to singular value decomposition or some other more robust method. Some problems simply cannot be solved with Gaussian elimination, no matter how hard you pivot them.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1301159%2fscale-vector-in-scaled-pivoting-numerical-methods%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If Numerical Analysis says that something is not worth it and that it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems, then all of your alarm bells should go off. How much worth is a "belief" in mathematics in the first place? And how would such a "majority" of linear systems be defined? So I wouldn't be surprised if the real reason for scaled pivoting is ease of calculations and that's it.
That being said, instead of heading for pivoting strategies, what I'm going to advertise here is rather to avoid any form of pivoting.
The method to be employed for that purpose is called Least Squares. Instead of solving the original system
$A x = b$ , we first square the equations and then solve the system $S x = A^T A x = A^T b$ instead. There are
a number of advantages and disadvantages with this method. A disadvantage is that the equations must first
be squared, at cost of computation time. But there are dumb and there are smart methods to do the squaring.
Another disadvantage is that the conditioning of the equations becomes worse, namely proportional to
the square of the original conditioning. The latter phenomenon may be cured by first optimizing the condition
of the original the equations, which can be done by norming them, row by row. Meaning that the coefficients
$a_{ki}$ of each row $(k)$ are divided by the square root of the sum of the squared coefficients of that row:
$$
a_{ki} := frac{a_{ki}}{sqrt{sum_i a_{ki}^2}}
$$
Let's talk about the advantages now. It's easy to see that the equations have become symmetric, which
definitely is an advantage. (Note that if the symmetry of the original system is destroyed by norming
then it is restored again by squaring) There is no pivoting needed, because:
$$
S_{ij} = S_{ji} = sum_k a_{ki} a_{kj} quad Longrightarrow quad S_{ii} = sum_k a_{ki}^2
$$
And this sum is only zero if all coefficients in a column are zero, which is certainly not the case with a
non-singular matrix. Thus all pivots are on the main diagonal. Furthermore:
$$
vec{a}_i = begin{bmatrix} a_{1i} \ a_{2i} \ cdots \ a_{Ni} end{bmatrix}
quad ; quad
vec{a}_j = begin{bmatrix} a_{1j} \ a_{2j} \ cdots \ a_{Nj} end{bmatrix}
qquad Longrightarrow qquad left{ begin{matrix}
left(vec{a_i}cdotvec{a_j}right) = sum_k a_{ki} a_{kj} = S_{ij} \
left(vec{a_i}cdotvec{a_i}right) = sum_k a_{ki} a_{ki} = S_{ii} \
left(vec{a_j}cdotvec{a_j}right) = sum_k a_{kj} a_{kj} = S_{jj} end{matrix} right.
$$
Schwarz inequality:
$$
left(vec{a_i}cdotvec{a_j}right)^2 le
left(vec{a_i}cdotvec{a_i}right)left(vec{a_j}cdotvec{a_j}right)
qquad Longleftrightarrow qquad
S_{ij}^2 le S_{ii} S_{jj}
$$
With the AM-GM inequality we find from here:
$$
frac{S_{ii}+S_{jj}}{2} ge sqrt{S_{ii}S_{jj}} ge left| S_{ij} right|
$$
With other words: the (arithmetic/geometric) mean of two main diagonal elements is positive
and always greater than the absolute value of the corresponding off-diagonal element.
This is consistent with the fact that all pivots are on the main diagonal.
Further properties are that for an orthogonal matrix $A$ the squared equations become the
identity: $S = A^T A = I$ and the sum of all main diagonal matrix elements equals the order of
the system: $sum_k S_{kk} = N$.
Simple $2 times 2$ example:
$$
left{ begin{matrix} x_2 = 1 \ 2 x_1 - x_2 = 1 end{matrix} right.
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
$$
Standard Gaussian elimination will fail because there is a zero in the first row and first column.
Let the equations be normed in the first place:
$$
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
$$
Least squares equivalent (calculating the square roots can always be avoided):
$$
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
qquad Longleftrightarrow \
begin{bmatrix} 4/5 & -2/5 \ -2/5 & 6/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 4/5 end{bmatrix}
$$
Gaussian elimination with the first row as a pivot:
$$
begin{bmatrix} 4/5 & -2/5 \ 0 & 5/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 5/5 end{bmatrix}
$$
Followed by backsubstitution, as usual.
EDIT. More information about Condition Numbers available
at the personal website MSE link.
$endgroup$
$begingroup$
Thanks for your suggestion of an alternative method. I have learned a little about Least Squared, but this is a great explanation. However, I am still interested in my original question. I am assuming that there's some kind of analysis that demonstrates why $S$ isn't recalculated, aside for east of calculations, but I still cannot find any such thing.
$endgroup$
– Cauthon
Jun 5 '15 at 11:28
$begingroup$
@Cauthon: Nobody has been able to claim the bounty, right? Alas, sometimes I only have to wait in order to be sure that my judgment has not been wrong.
$endgroup$
– Han de Bruijn
Jun 10 '15 at 13:36
$begingroup$
Nope, no answer out there :(
$endgroup$
– Cauthon
Jun 10 '15 at 19:51
add a comment |
$begingroup$
If Numerical Analysis says that something is not worth it and that it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems, then all of your alarm bells should go off. How much worth is a "belief" in mathematics in the first place? And how would such a "majority" of linear systems be defined? So I wouldn't be surprised if the real reason for scaled pivoting is ease of calculations and that's it.
That being said, instead of heading for pivoting strategies, what I'm going to advertise here is rather to avoid any form of pivoting.
The method to be employed for that purpose is called Least Squares. Instead of solving the original system
$A x = b$ , we first square the equations and then solve the system $S x = A^T A x = A^T b$ instead. There are
a number of advantages and disadvantages with this method. A disadvantage is that the equations must first
be squared, at cost of computation time. But there are dumb and there are smart methods to do the squaring.
Another disadvantage is that the conditioning of the equations becomes worse, namely proportional to
the square of the original conditioning. The latter phenomenon may be cured by first optimizing the condition
of the original the equations, which can be done by norming them, row by row. Meaning that the coefficients
$a_{ki}$ of each row $(k)$ are divided by the square root of the sum of the squared coefficients of that row:
$$
a_{ki} := frac{a_{ki}}{sqrt{sum_i a_{ki}^2}}
$$
Let's talk about the advantages now. It's easy to see that the equations have become symmetric, which
definitely is an advantage. (Note that if the symmetry of the original system is destroyed by norming
then it is restored again by squaring) There is no pivoting needed, because:
$$
S_{ij} = S_{ji} = sum_k a_{ki} a_{kj} quad Longrightarrow quad S_{ii} = sum_k a_{ki}^2
$$
And this sum is only zero if all coefficients in a column are zero, which is certainly not the case with a
non-singular matrix. Thus all pivots are on the main diagonal. Furthermore:
$$
vec{a}_i = begin{bmatrix} a_{1i} \ a_{2i} \ cdots \ a_{Ni} end{bmatrix}
quad ; quad
vec{a}_j = begin{bmatrix} a_{1j} \ a_{2j} \ cdots \ a_{Nj} end{bmatrix}
qquad Longrightarrow qquad left{ begin{matrix}
left(vec{a_i}cdotvec{a_j}right) = sum_k a_{ki} a_{kj} = S_{ij} \
left(vec{a_i}cdotvec{a_i}right) = sum_k a_{ki} a_{ki} = S_{ii} \
left(vec{a_j}cdotvec{a_j}right) = sum_k a_{kj} a_{kj} = S_{jj} end{matrix} right.
$$
Schwarz inequality:
$$
left(vec{a_i}cdotvec{a_j}right)^2 le
left(vec{a_i}cdotvec{a_i}right)left(vec{a_j}cdotvec{a_j}right)
qquad Longleftrightarrow qquad
S_{ij}^2 le S_{ii} S_{jj}
$$
With the AM-GM inequality we find from here:
$$
frac{S_{ii}+S_{jj}}{2} ge sqrt{S_{ii}S_{jj}} ge left| S_{ij} right|
$$
With other words: the (arithmetic/geometric) mean of two main diagonal elements is positive
and always greater than the absolute value of the corresponding off-diagonal element.
This is consistent with the fact that all pivots are on the main diagonal.
Further properties are that for an orthogonal matrix $A$ the squared equations become the
identity: $S = A^T A = I$ and the sum of all main diagonal matrix elements equals the order of
the system: $sum_k S_{kk} = N$.
Simple $2 times 2$ example:
$$
left{ begin{matrix} x_2 = 1 \ 2 x_1 - x_2 = 1 end{matrix} right.
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
$$
Standard Gaussian elimination will fail because there is a zero in the first row and first column.
Let the equations be normed in the first place:
$$
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
$$
Least squares equivalent (calculating the square roots can always be avoided):
$$
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
qquad Longleftrightarrow \
begin{bmatrix} 4/5 & -2/5 \ -2/5 & 6/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 4/5 end{bmatrix}
$$
Gaussian elimination with the first row as a pivot:
$$
begin{bmatrix} 4/5 & -2/5 \ 0 & 5/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 5/5 end{bmatrix}
$$
Followed by backsubstitution, as usual.
EDIT. More information about Condition Numbers available
at the personal website MSE link.
$endgroup$
$begingroup$
Thanks for your suggestion of an alternative method. I have learned a little about Least Squared, but this is a great explanation. However, I am still interested in my original question. I am assuming that there's some kind of analysis that demonstrates why $S$ isn't recalculated, aside for east of calculations, but I still cannot find any such thing.
$endgroup$
– Cauthon
Jun 5 '15 at 11:28
$begingroup$
@Cauthon: Nobody has been able to claim the bounty, right? Alas, sometimes I only have to wait in order to be sure that my judgment has not been wrong.
$endgroup$
– Han de Bruijn
Jun 10 '15 at 13:36
$begingroup$
Nope, no answer out there :(
$endgroup$
– Cauthon
Jun 10 '15 at 19:51
add a comment |
$begingroup$
If Numerical Analysis says that something is not worth it and that it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems, then all of your alarm bells should go off. How much worth is a "belief" in mathematics in the first place? And how would such a "majority" of linear systems be defined? So I wouldn't be surprised if the real reason for scaled pivoting is ease of calculations and that's it.
That being said, instead of heading for pivoting strategies, what I'm going to advertise here is rather to avoid any form of pivoting.
The method to be employed for that purpose is called Least Squares. Instead of solving the original system
$A x = b$ , we first square the equations and then solve the system $S x = A^T A x = A^T b$ instead. There are
a number of advantages and disadvantages with this method. A disadvantage is that the equations must first
be squared, at cost of computation time. But there are dumb and there are smart methods to do the squaring.
Another disadvantage is that the conditioning of the equations becomes worse, namely proportional to
the square of the original conditioning. The latter phenomenon may be cured by first optimizing the condition
of the original the equations, which can be done by norming them, row by row. Meaning that the coefficients
$a_{ki}$ of each row $(k)$ are divided by the square root of the sum of the squared coefficients of that row:
$$
a_{ki} := frac{a_{ki}}{sqrt{sum_i a_{ki}^2}}
$$
Let's talk about the advantages now. It's easy to see that the equations have become symmetric, which
definitely is an advantage. (Note that if the symmetry of the original system is destroyed by norming
then it is restored again by squaring) There is no pivoting needed, because:
$$
S_{ij} = S_{ji} = sum_k a_{ki} a_{kj} quad Longrightarrow quad S_{ii} = sum_k a_{ki}^2
$$
And this sum is only zero if all coefficients in a column are zero, which is certainly not the case with a
non-singular matrix. Thus all pivots are on the main diagonal. Furthermore:
$$
vec{a}_i = begin{bmatrix} a_{1i} \ a_{2i} \ cdots \ a_{Ni} end{bmatrix}
quad ; quad
vec{a}_j = begin{bmatrix} a_{1j} \ a_{2j} \ cdots \ a_{Nj} end{bmatrix}
qquad Longrightarrow qquad left{ begin{matrix}
left(vec{a_i}cdotvec{a_j}right) = sum_k a_{ki} a_{kj} = S_{ij} \
left(vec{a_i}cdotvec{a_i}right) = sum_k a_{ki} a_{ki} = S_{ii} \
left(vec{a_j}cdotvec{a_j}right) = sum_k a_{kj} a_{kj} = S_{jj} end{matrix} right.
$$
Schwarz inequality:
$$
left(vec{a_i}cdotvec{a_j}right)^2 le
left(vec{a_i}cdotvec{a_i}right)left(vec{a_j}cdotvec{a_j}right)
qquad Longleftrightarrow qquad
S_{ij}^2 le S_{ii} S_{jj}
$$
With the AM-GM inequality we find from here:
$$
frac{S_{ii}+S_{jj}}{2} ge sqrt{S_{ii}S_{jj}} ge left| S_{ij} right|
$$
With other words: the (arithmetic/geometric) mean of two main diagonal elements is positive
and always greater than the absolute value of the corresponding off-diagonal element.
This is consistent with the fact that all pivots are on the main diagonal.
Further properties are that for an orthogonal matrix $A$ the squared equations become the
identity: $S = A^T A = I$ and the sum of all main diagonal matrix elements equals the order of
the system: $sum_k S_{kk} = N$.
Simple $2 times 2$ example:
$$
left{ begin{matrix} x_2 = 1 \ 2 x_1 - x_2 = 1 end{matrix} right.
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
$$
Standard Gaussian elimination will fail because there is a zero in the first row and first column.
Let the equations be normed in the first place:
$$
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
$$
Least squares equivalent (calculating the square roots can always be avoided):
$$
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
qquad Longleftrightarrow \
begin{bmatrix} 4/5 & -2/5 \ -2/5 & 6/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 4/5 end{bmatrix}
$$
Gaussian elimination with the first row as a pivot:
$$
begin{bmatrix} 4/5 & -2/5 \ 0 & 5/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 5/5 end{bmatrix}
$$
Followed by backsubstitution, as usual.
EDIT. More information about Condition Numbers available
at the personal website MSE link.
$endgroup$
If Numerical Analysis says that something is not worth it and that it is generally believed that the extra computations involved in this procedure are not worthwhile in the majority of linear systems, then all of your alarm bells should go off. How much worth is a "belief" in mathematics in the first place? And how would such a "majority" of linear systems be defined? So I wouldn't be surprised if the real reason for scaled pivoting is ease of calculations and that's it.
That being said, instead of heading for pivoting strategies, what I'm going to advertise here is rather to avoid any form of pivoting.
The method to be employed for that purpose is called Least Squares. Instead of solving the original system
$A x = b$ , we first square the equations and then solve the system $S x = A^T A x = A^T b$ instead. There are
a number of advantages and disadvantages with this method. A disadvantage is that the equations must first
be squared, at cost of computation time. But there are dumb and there are smart methods to do the squaring.
Another disadvantage is that the conditioning of the equations becomes worse, namely proportional to
the square of the original conditioning. The latter phenomenon may be cured by first optimizing the condition
of the original the equations, which can be done by norming them, row by row. Meaning that the coefficients
$a_{ki}$ of each row $(k)$ are divided by the square root of the sum of the squared coefficients of that row:
$$
a_{ki} := frac{a_{ki}}{sqrt{sum_i a_{ki}^2}}
$$
Let's talk about the advantages now. It's easy to see that the equations have become symmetric, which
definitely is an advantage. (Note that if the symmetry of the original system is destroyed by norming
then it is restored again by squaring) There is no pivoting needed, because:
$$
S_{ij} = S_{ji} = sum_k a_{ki} a_{kj} quad Longrightarrow quad S_{ii} = sum_k a_{ki}^2
$$
And this sum is only zero if all coefficients in a column are zero, which is certainly not the case with a
non-singular matrix. Thus all pivots are on the main diagonal. Furthermore:
$$
vec{a}_i = begin{bmatrix} a_{1i} \ a_{2i} \ cdots \ a_{Ni} end{bmatrix}
quad ; quad
vec{a}_j = begin{bmatrix} a_{1j} \ a_{2j} \ cdots \ a_{Nj} end{bmatrix}
qquad Longrightarrow qquad left{ begin{matrix}
left(vec{a_i}cdotvec{a_j}right) = sum_k a_{ki} a_{kj} = S_{ij} \
left(vec{a_i}cdotvec{a_i}right) = sum_k a_{ki} a_{ki} = S_{ii} \
left(vec{a_j}cdotvec{a_j}right) = sum_k a_{kj} a_{kj} = S_{jj} end{matrix} right.
$$
Schwarz inequality:
$$
left(vec{a_i}cdotvec{a_j}right)^2 le
left(vec{a_i}cdotvec{a_i}right)left(vec{a_j}cdotvec{a_j}right)
qquad Longleftrightarrow qquad
S_{ij}^2 le S_{ii} S_{jj}
$$
With the AM-GM inequality we find from here:
$$
frac{S_{ii}+S_{jj}}{2} ge sqrt{S_{ii}S_{jj}} ge left| S_{ij} right|
$$
With other words: the (arithmetic/geometric) mean of two main diagonal elements is positive
and always greater than the absolute value of the corresponding off-diagonal element.
This is consistent with the fact that all pivots are on the main diagonal.
Further properties are that for an orthogonal matrix $A$ the squared equations become the
identity: $S = A^T A = I$ and the sum of all main diagonal matrix elements equals the order of
the system: $sum_k S_{kk} = N$.
Simple $2 times 2$ example:
$$
left{ begin{matrix} x_2 = 1 \ 2 x_1 - x_2 = 1 end{matrix} right.
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
$$
Standard Gaussian elimination will fail because there is a zero in the first row and first column.
Let the equations be normed in the first place:
$$
begin{bmatrix} 0 & 1 \ 2 & -1 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1 end{bmatrix}
qquad Longleftrightarrow qquad
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
$$
Least squares equivalent (calculating the square roots can always be avoided):
$$
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 0 & 1 \ 2/sqrt{5} & -1/sqrt{5} end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 0 & 2/sqrt{5} \ 1 & -1/sqrt{5} end{bmatrix}
begin{bmatrix} 1 \ 1/sqrt{5} end{bmatrix}
qquad Longleftrightarrow \
begin{bmatrix} 4/5 & -2/5 \ -2/5 & 6/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 4/5 end{bmatrix}
$$
Gaussian elimination with the first row as a pivot:
$$
begin{bmatrix} 4/5 & -2/5 \ 0 & 5/5 end{bmatrix}
begin{bmatrix} x_1 \ x_2 end{bmatrix} =
begin{bmatrix} 2/5 \ 5/5 end{bmatrix}
$$
Followed by backsubstitution, as usual.
EDIT. More information about Condition Numbers available
at the personal website MSE link.
edited Jun 10 '15 at 13:29
answered Jun 4 '15 at 12:59
Han de BruijnHan de Bruijn
12.2k22361
12.2k22361
$begingroup$
Thanks for your suggestion of an alternative method. I have learned a little about Least Squared, but this is a great explanation. However, I am still interested in my original question. I am assuming that there's some kind of analysis that demonstrates why $S$ isn't recalculated, aside for east of calculations, but I still cannot find any such thing.
$endgroup$
– Cauthon
Jun 5 '15 at 11:28
$begingroup$
@Cauthon: Nobody has been able to claim the bounty, right? Alas, sometimes I only have to wait in order to be sure that my judgment has not been wrong.
$endgroup$
– Han de Bruijn
Jun 10 '15 at 13:36
$begingroup$
Nope, no answer out there :(
$endgroup$
– Cauthon
Jun 10 '15 at 19:51
add a comment |
$begingroup$
Thanks for your suggestion of an alternative method. I have learned a little about Least Squared, but this is a great explanation. However, I am still interested in my original question. I am assuming that there's some kind of analysis that demonstrates why $S$ isn't recalculated, aside for east of calculations, but I still cannot find any such thing.
$endgroup$
– Cauthon
Jun 5 '15 at 11:28
$begingroup$
@Cauthon: Nobody has been able to claim the bounty, right? Alas, sometimes I only have to wait in order to be sure that my judgment has not been wrong.
$endgroup$
– Han de Bruijn
Jun 10 '15 at 13:36
$begingroup$
Nope, no answer out there :(
$endgroup$
– Cauthon
Jun 10 '15 at 19:51
$begingroup$
Thanks for your suggestion of an alternative method. I have learned a little about Least Squared, but this is a great explanation. However, I am still interested in my original question. I am assuming that there's some kind of analysis that demonstrates why $S$ isn't recalculated, aside for east of calculations, but I still cannot find any such thing.
$endgroup$
– Cauthon
Jun 5 '15 at 11:28
$begingroup$
Thanks for your suggestion of an alternative method. I have learned a little about Least Squared, but this is a great explanation. However, I am still interested in my original question. I am assuming that there's some kind of analysis that demonstrates why $S$ isn't recalculated, aside for east of calculations, but I still cannot find any such thing.
$endgroup$
– Cauthon
Jun 5 '15 at 11:28
$begingroup$
@Cauthon: Nobody has been able to claim the bounty, right? Alas, sometimes I only have to wait in order to be sure that my judgment has not been wrong.
$endgroup$
– Han de Bruijn
Jun 10 '15 at 13:36
$begingroup$
@Cauthon: Nobody has been able to claim the bounty, right? Alas, sometimes I only have to wait in order to be sure that my judgment has not been wrong.
$endgroup$
– Han de Bruijn
Jun 10 '15 at 13:36
$begingroup$
Nope, no answer out there :(
$endgroup$
– Cauthon
Jun 10 '15 at 19:51
$begingroup$
Nope, no answer out there :(
$endgroup$
– Cauthon
Jun 10 '15 at 19:51
add a comment |
$begingroup$
The cited passage does not suggest that you should not perform the same (row) permutation also in the vector of the scaling coefficients. You need to do that.
However, recall that reducing the pivot and the associated column and/or row (depending on the choice of the algorithm) changes the rest of the matrix. So while you might have collected per-row maxima, these will no longer be true after this step. So if you were to be correct, you would need to recompute the scaling vector.
The scaling vector needs to hold maxima for all elements of all of the remaining rows (not only the unreduced portions of the remaining rows). That amounts to $O(n^2)$ in the first step, $O(n cdot (n - 1))$ in the second step and $O(n)$ in the second last step. This sums up to roughly $O(n^3/2)$ which is comparable to the $O(n^3)$ cost of the whole elimination, so it is deemed too costly to be worthwhile. Also the numerical experiments mostly show that the advantage in reducing the error is not worth it.
The upper bound on accuracy depends a lot on the matrix at hand. In any case, for some problems you want to turn to singular value decomposition or some other more robust method. Some problems simply cannot be solved with Gaussian elimination, no matter how hard you pivot them.
$endgroup$
add a comment |
$begingroup$
The cited passage does not suggest that you should not perform the same (row) permutation also in the vector of the scaling coefficients. You need to do that.
However, recall that reducing the pivot and the associated column and/or row (depending on the choice of the algorithm) changes the rest of the matrix. So while you might have collected per-row maxima, these will no longer be true after this step. So if you were to be correct, you would need to recompute the scaling vector.
The scaling vector needs to hold maxima for all elements of all of the remaining rows (not only the unreduced portions of the remaining rows). That amounts to $O(n^2)$ in the first step, $O(n cdot (n - 1))$ in the second step and $O(n)$ in the second last step. This sums up to roughly $O(n^3/2)$ which is comparable to the $O(n^3)$ cost of the whole elimination, so it is deemed too costly to be worthwhile. Also the numerical experiments mostly show that the advantage in reducing the error is not worth it.
The upper bound on accuracy depends a lot on the matrix at hand. In any case, for some problems you want to turn to singular value decomposition or some other more robust method. Some problems simply cannot be solved with Gaussian elimination, no matter how hard you pivot them.
$endgroup$
add a comment |
$begingroup$
The cited passage does not suggest that you should not perform the same (row) permutation also in the vector of the scaling coefficients. You need to do that.
However, recall that reducing the pivot and the associated column and/or row (depending on the choice of the algorithm) changes the rest of the matrix. So while you might have collected per-row maxima, these will no longer be true after this step. So if you were to be correct, you would need to recompute the scaling vector.
The scaling vector needs to hold maxima for all elements of all of the remaining rows (not only the unreduced portions of the remaining rows). That amounts to $O(n^2)$ in the first step, $O(n cdot (n - 1))$ in the second step and $O(n)$ in the second last step. This sums up to roughly $O(n^3/2)$ which is comparable to the $O(n^3)$ cost of the whole elimination, so it is deemed too costly to be worthwhile. Also the numerical experiments mostly show that the advantage in reducing the error is not worth it.
The upper bound on accuracy depends a lot on the matrix at hand. In any case, for some problems you want to turn to singular value decomposition or some other more robust method. Some problems simply cannot be solved with Gaussian elimination, no matter how hard you pivot them.
$endgroup$
The cited passage does not suggest that you should not perform the same (row) permutation also in the vector of the scaling coefficients. You need to do that.
However, recall that reducing the pivot and the associated column and/or row (depending on the choice of the algorithm) changes the rest of the matrix. So while you might have collected per-row maxima, these will no longer be true after this step. So if you were to be correct, you would need to recompute the scaling vector.
The scaling vector needs to hold maxima for all elements of all of the remaining rows (not only the unreduced portions of the remaining rows). That amounts to $O(n^2)$ in the first step, $O(n cdot (n - 1))$ in the second step and $O(n)$ in the second last step. This sums up to roughly $O(n^3/2)$ which is comparable to the $O(n^3)$ cost of the whole elimination, so it is deemed too costly to be worthwhile. Also the numerical experiments mostly show that the advantage in reducing the error is not worth it.
The upper bound on accuracy depends a lot on the matrix at hand. In any case, for some problems you want to turn to singular value decomposition or some other more robust method. Some problems simply cannot be solved with Gaussian elimination, no matter how hard you pivot them.
edited Jan 14 '17 at 13:56
answered Jan 14 '17 at 1:12
the swinethe swine
2261312
2261312
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1301159%2fscale-vector-in-scaled-pivoting-numerical-methods%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown