Quadratic n term equation using multiindex
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have two DFs which I would like to use to calculate the following:
w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)
The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.
Set Up - Edit dynamic W
import pandas as pd
import numpy as np
I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100
df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()
df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178
w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)
W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000
Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.
The end result for df1.loc['i0','q0']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374
The end result for df1.loc['i0','q1']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1
This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).
Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n
increases in value.
Thanks in advance
python numpy dataframe multi-index quadratic
add a comment |
I have two DFs which I would like to use to calculate the following:
w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)
The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.
Set Up - Edit dynamic W
import pandas as pd
import numpy as np
I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100
df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()
df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178
w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)
W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000
Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.
The end result for df1.loc['i0','q0']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374
The end result for df1.loc['i0','q1']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1
This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).
Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n
increases in value.
Thanks in advance
python numpy dataframe multi-index quadratic
2
Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.
– wwii
Nov 22 '18 at 21:52
2
df1.loc['i0','q0'
has threeTn
's. How does it work?
– wwii
Nov 22 '18 at 22:03
1
IsW
not supposed to be symmetric? if not, how I know which factor to used betweenW.loc['t3','t4']
andW.loc['t4','t3']
for the example you give, because you use the first one but why?
– Ben.T
Nov 22 '18 at 22:09
1
I have changed the question to correspond with the comments
– RealRageDontQuit
Nov 22 '18 at 22:42
add a comment |
I have two DFs which I would like to use to calculate the following:
w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)
The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.
Set Up - Edit dynamic W
import pandas as pd
import numpy as np
I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100
df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()
df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178
w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)
W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000
Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.
The end result for df1.loc['i0','q0']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374
The end result for df1.loc['i0','q1']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1
This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).
Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n
increases in value.
Thanks in advance
python numpy dataframe multi-index quadratic
I have two DFs which I would like to use to calculate the following:
w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)
The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.
Set Up - Edit dynamic W
import pandas as pd
import numpy as np
I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100
df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()
df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178
w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)
W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000
Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.
The end result for df1.loc['i0','q0']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374
The end result for df1.loc['i0','q1']
in the example above should be:
W(t0,t0) * V(t0)^2
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1
This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).
Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n
increases in value.
Thanks in advance
python numpy dataframe multi-index quadratic
python numpy dataframe multi-index quadratic
edited Nov 25 '18 at 8:36
RealRageDontQuit
asked Nov 22 '18 at 21:33
RealRageDontQuitRealRageDontQuit
508
508
2
Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.
– wwii
Nov 22 '18 at 21:52
2
df1.loc['i0','q0'
has threeTn
's. How does it work?
– wwii
Nov 22 '18 at 22:03
1
IsW
not supposed to be symmetric? if not, how I know which factor to used betweenW.loc['t3','t4']
andW.loc['t4','t3']
for the example you give, because you use the first one but why?
– Ben.T
Nov 22 '18 at 22:09
1
I have changed the question to correspond with the comments
– RealRageDontQuit
Nov 22 '18 at 22:42
add a comment |
2
Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.
– wwii
Nov 22 '18 at 21:52
2
df1.loc['i0','q0'
has threeTn
's. How does it work?
– wwii
Nov 22 '18 at 22:03
1
IsW
not supposed to be symmetric? if not, how I know which factor to used betweenW.loc['t3','t4']
andW.loc['t4','t3']
for the example you give, because you use the first one but why?
– Ben.T
Nov 22 '18 at 22:09
1
I have changed the question to correspond with the comments
– RealRageDontQuit
Nov 22 '18 at 22:42
2
2
Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.
– wwii
Nov 22 '18 at 21:52
Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.
– wwii
Nov 22 '18 at 21:52
2
2
df1.loc['i0','q0'
has three Tn
's. How does it work?– wwii
Nov 22 '18 at 22:03
df1.loc['i0','q0'
has three Tn
's. How does it work?– wwii
Nov 22 '18 at 22:03
1
1
Is
W
not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4']
and W.loc['t4','t3']
for the example you give, because you use the first one but why?– Ben.T
Nov 22 '18 at 22:09
Is
W
not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4']
and W.loc['t4','t3']
for the example you give, because you use the first one but why?– Ben.T
Nov 22 '18 at 22:09
1
1
I have changed the question to correspond with the comments
– RealRageDontQuit
Nov 22 '18 at 22:42
I have changed the question to correspond with the comments
– RealRageDontQuit
Nov 22 '18 at 22:42
add a comment |
1 Answer
1
active
oldest
votes
One way could be first reindex
your dataframe df1
with all the possible combinations of the lists I
, Q
and Tn
with pd.MultiIndex.from_product
, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T)
elements. Then you can reshape
the values
to get each row related to one combination on I
and Q
such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1
and ar
, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W
, one way is to create the outer product of ar
with itself but row-wise to get, for each row a len(T)*len(T)
matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:]
or np.einsum
with the right subscript: np.einsum('ij,ik->ijk',ar,ar)
. Both give same result.
The next step can be done with a tensordot
and specify the right axes
. So with ar
and W
as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438
(where 1 is W(t0,t0)
and W(t1,t1)
, 2 is W(t0,t1)
)
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product
:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T
is at least once in the last level of df1
, the ar
can be obtain using unstack
such as ar=df1.unstack(fill_value=0).values
. But I would suggest to use the reindex
method above to prevent any error
This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you
– RealRageDontQuit
Nov 24 '18 at 11:16
@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrixs
and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.
– Ben.T
Nov 24 '18 at 12:14
1
you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.
– RealRageDontQuit
Nov 25 '18 at 8:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438193%2fquadratic-n-term-equation-using-multiindex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
One way could be first reindex
your dataframe df1
with all the possible combinations of the lists I
, Q
and Tn
with pd.MultiIndex.from_product
, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T)
elements. Then you can reshape
the values
to get each row related to one combination on I
and Q
such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1
and ar
, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W
, one way is to create the outer product of ar
with itself but row-wise to get, for each row a len(T)*len(T)
matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:]
or np.einsum
with the right subscript: np.einsum('ij,ik->ijk',ar,ar)
. Both give same result.
The next step can be done with a tensordot
and specify the right axes
. So with ar
and W
as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438
(where 1 is W(t0,t0)
and W(t1,t1)
, 2 is W(t0,t1)
)
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product
:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T
is at least once in the last level of df1
, the ar
can be obtain using unstack
such as ar=df1.unstack(fill_value=0).values
. But I would suggest to use the reindex
method above to prevent any error
This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you
– RealRageDontQuit
Nov 24 '18 at 11:16
@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrixs
and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.
– Ben.T
Nov 24 '18 at 12:14
1
you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.
– RealRageDontQuit
Nov 25 '18 at 8:37
add a comment |
One way could be first reindex
your dataframe df1
with all the possible combinations of the lists I
, Q
and Tn
with pd.MultiIndex.from_product
, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T)
elements. Then you can reshape
the values
to get each row related to one combination on I
and Q
such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1
and ar
, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W
, one way is to create the outer product of ar
with itself but row-wise to get, for each row a len(T)*len(T)
matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:]
or np.einsum
with the right subscript: np.einsum('ij,ik->ijk',ar,ar)
. Both give same result.
The next step can be done with a tensordot
and specify the right axes
. So with ar
and W
as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438
(where 1 is W(t0,t0)
and W(t1,t1)
, 2 is W(t0,t1)
)
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product
:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T
is at least once in the last level of df1
, the ar
can be obtain using unstack
such as ar=df1.unstack(fill_value=0).values
. But I would suggest to use the reindex
method above to prevent any error
This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you
– RealRageDontQuit
Nov 24 '18 at 11:16
@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrixs
and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.
– Ben.T
Nov 24 '18 at 12:14
1
you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.
– RealRageDontQuit
Nov 25 '18 at 8:37
add a comment |
One way could be first reindex
your dataframe df1
with all the possible combinations of the lists I
, Q
and Tn
with pd.MultiIndex.from_product
, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T)
elements. Then you can reshape
the values
to get each row related to one combination on I
and Q
such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1
and ar
, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W
, one way is to create the outer product of ar
with itself but row-wise to get, for each row a len(T)*len(T)
matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:]
or np.einsum
with the right subscript: np.einsum('ij,ik->ijk',ar,ar)
. Both give same result.
The next step can be done with a tensordot
and specify the right axes
. So with ar
and W
as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438
(where 1 is W(t0,t0)
and W(t1,t1)
, 2 is W(t0,t1)
)
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product
:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T
is at least once in the last level of df1
, the ar
can be obtain using unstack
such as ar=df1.unstack(fill_value=0).values
. But I would suggest to use the reindex
method above to prevent any error
One way could be first reindex
your dataframe df1
with all the possible combinations of the lists I
, Q
and Tn
with pd.MultiIndex.from_product
, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T)
elements. Then you can reshape
the values
to get each row related to one combination on I
and Q
such as:
ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))
To see the relation between my input df1
and ar
, here are some related rows
print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]
Now, to perform the multiplication with the element of W
, one way is to create the outer product of ar
with itself but row-wise to get, for each row a len(T)*len(T)
matrix. For example, for the second row:
[0.53861027 2.94320574 0. ]
becomes
[[0.29010102, 1.58524083, 0. ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]
Several methods are possible such as ar[:,:,None]*ar[:,None,:]
or np.einsum
with the right subscript: np.einsum('ij,ik->ijk',ar,ar)
. Both give same result.
The next step can be done with a tensordot
and specify the right axes
. So with ar
and W
as an input, you do:
print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...
To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438
(where 1 is W(t0,t0)
and W(t1,t1)
, 2 is W(t0,t1)
)
Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product
:
new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))
print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...
Note: if you are SURE that each element of T
is at least once in the last level of df1
, the ar
can be obtain using unstack
such as ar=df1.unstack(fill_value=0).values
. But I would suggest to use the reindex
method above to prevent any error
answered Nov 23 '18 at 19:40
Ben.TBen.T
6,6903928
6,6903928
This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you
– RealRageDontQuit
Nov 24 '18 at 11:16
@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrixs
and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.
– Ben.T
Nov 24 '18 at 12:14
1
you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.
– RealRageDontQuit
Nov 25 '18 at 8:37
add a comment |
This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you
– RealRageDontQuit
Nov 24 '18 at 11:16
@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrixs
and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.
– Ben.T
Nov 24 '18 at 12:14
1
you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.
– RealRageDontQuit
Nov 25 '18 at 8:37
This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you
– RealRageDontQuit
Nov 24 '18 at 11:16
This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you
– RealRageDontQuit
Nov 24 '18 at 11:16
@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix
s
and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.– Ben.T
Nov 24 '18 at 12:14
@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix
s
and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.– Ben.T
Nov 24 '18 at 12:14
1
1
you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.
– RealRageDontQuit
Nov 25 '18 at 8:37
you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.
– RealRageDontQuit
Nov 25 '18 at 8:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438193%2fquadratic-n-term-equation-using-multiindex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.
– wwii
Nov 22 '18 at 21:52
2
df1.loc['i0','q0'
has threeTn
's. How does it work?– wwii
Nov 22 '18 at 22:03
1
Is
W
not supposed to be symmetric? if not, how I know which factor to used betweenW.loc['t3','t4']
andW.loc['t4','t3']
for the example you give, because you use the first one but why?– Ben.T
Nov 22 '18 at 22:09
1
I have changed the question to correspond with the comments
– RealRageDontQuit
Nov 22 '18 at 22:42