Run quantized tensorflow model on FPGA / pure python

I have a model trained in keras which is a simple model trained on MNIST dataset.

What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.

First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).

So I have quantized model and accuracy is about 90%.

Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.

Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.

So my question is how to write a feed forward using numpy?

My model in keras looks like this:

model = Sequential()

model.add(Dense(512, input_shape=input_shape))

model.add(Activation(tf.nn.relu))

model.add(Dense(100))

model.add(Activation(tf.nn.relu))

model.add(Dense(num_classes))

model.add(Activation(tf.nn.softmax))

model.compile(

    optimizer=Adam(),

    loss='categorical_crossentropy',

    metrics=['accuracy'],

)

I converted it with TocoConverter. And it works in tensorflow.

Then I try to write feed forward in pure python:

for img, label in zip(x_test, y_test):

    img = img.astype('uint8')

    total_seen += 1

    label = tf.keras.utils.to_categorical(label, num_classes=num_classes)

    X = img.reshape(1, 784)

    z1 = np.dot(X, W0.T) + b0

    a1 = relu(z1)

    z2 = np.dot(a1, W1.T) + b1

    a2 = relu(z2)

    z3 = np.dot(a2, W2.T) + b2

    prediction = np.argmax(z3)

    label = np.argmax(label)

    if prediction == label:

        num_correct += 1

But this model accuracy is about 10%, so something goes wrong.
How to correct this model?

Thanks in advance.

Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?

edited Nov 26 '18 at 21:51

asked Nov 21 '18 at 21:54

Damian

163

Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

– E.Coms
Nov 21 '18 at 22:24

add a comment |

I have a model trained in keras which is a simple model trained on MNIST dataset.

What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.

First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).

So I have quantized model and accuracy is about 90%.

Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.

Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.

So my question is how to write a feed forward using numpy?

My model in keras looks like this:

model = Sequential()

model.add(Dense(512, input_shape=input_shape))

model.add(Activation(tf.nn.relu))

model.add(Dense(100))

model.add(Activation(tf.nn.relu))

model.add(Dense(num_classes))

model.add(Activation(tf.nn.softmax))

model.compile(

    optimizer=Adam(),

    loss='categorical_crossentropy',

    metrics=['accuracy'],

)

I converted it with TocoConverter. And it works in tensorflow.

Then I try to write feed forward in pure python:

for img, label in zip(x_test, y_test):

    img = img.astype('uint8')

    total_seen += 1

    label = tf.keras.utils.to_categorical(label, num_classes=num_classes)

    X = img.reshape(1, 784)

    z1 = np.dot(X, W0.T) + b0

    a1 = relu(z1)

    z2 = np.dot(a1, W1.T) + b1

    a2 = relu(z2)

    z3 = np.dot(a2, W2.T) + b2

    prediction = np.argmax(z3)

    label = np.argmax(label)

    if prediction == label:

        num_correct += 1

But this model accuracy is about 10%, so something goes wrong.
How to correct this model?

Thanks in advance.

Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

edited Nov 26 '18 at 21:51

asked Nov 21 '18 at 21:54

Damian

163

Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

– E.Coms
Nov 21 '18 at 22:24

add a comment |

I have a model trained in keras which is a simple model trained on MNIST dataset.

What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.

First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).

So I have quantized model and accuracy is about 90%.

Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.

Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.

So my question is how to write a feed forward using numpy?

My model in keras looks like this:

model = Sequential()

model.add(Dense(512, input_shape=input_shape))

model.add(Activation(tf.nn.relu))

model.add(Dense(100))

model.add(Activation(tf.nn.relu))

model.add(Dense(num_classes))

model.add(Activation(tf.nn.softmax))

model.compile(

    optimizer=Adam(),

    loss='categorical_crossentropy',

    metrics=['accuracy'],

)

I converted it with TocoConverter. And it works in tensorflow.

Then I try to write feed forward in pure python:

for img, label in zip(x_test, y_test):

    img = img.astype('uint8')

    total_seen += 1

    label = tf.keras.utils.to_categorical(label, num_classes=num_classes)

    X = img.reshape(1, 784)

    z1 = np.dot(X, W0.T) + b0

    a1 = relu(z1)

    z2 = np.dot(a1, W1.T) + b1

    a2 = relu(z2)

    z3 = np.dot(a2, W2.T) + b2

    prediction = np.argmax(z3)

    label = np.argmax(label)

    if prediction == label:

        num_correct += 1

But this model accuracy is about 10%, so something goes wrong.
How to correct this model?

Thanks in advance.

Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

edited Nov 26 '18 at 21:51

asked Nov 21 '18 at 21:54

Damian

163

I have a model trained in keras which is a simple model trained on MNIST dataset.

What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.

First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).

So I have quantized model and accuracy is about 90%.

Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.

Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.

So my question is how to write a feed forward using numpy?

My model in keras looks like this:

model = Sequential()

model.add(Dense(512, input_shape=input_shape))

model.add(Activation(tf.nn.relu))

model.add(Dense(100))

model.add(Activation(tf.nn.relu))

model.add(Dense(num_classes))

model.add(Activation(tf.nn.softmax))

model.compile(

    optimizer=Adam(),

    loss='categorical_crossentropy',

    metrics=['accuracy'],

)

I converted it with TocoConverter. And it works in tensorflow.

Then I try to write feed forward in pure python:

for img, label in zip(x_test, y_test):

    img = img.astype('uint8')

    total_seen += 1

    label = tf.keras.utils.to_categorical(label, num_classes=num_classes)

    X = img.reshape(1, 784)

    z1 = np.dot(X, W0.T) + b0

    a1 = relu(z1)

    z2 = np.dot(a1, W1.T) + b1

    a2 = relu(z2)

    z3 = np.dot(a2, W2.T) + b2

    prediction = np.argmax(z3)

    label = np.argmax(label)

    if prediction == label:

        num_correct += 1

But this model accuracy is about 10%, so something goes wrong.
How to correct this model?

Thanks in advance.

Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

python tensorflow deep-learning tensorflow-lite quantization

edited Nov 26 '18 at 21:51

asked Nov 21 '18 at 21:54

Damian

163

edited Nov 26 '18 at 21:51

asked Nov 21 '18 at 21:54

Damian

163

edited Nov 26 '18 at 21:51

asked Nov 21 '18 at 21:54

Damian

163

asked Nov 21 '18 at 21:54

Damian

163

asked Nov 21 '18 at 21:54

Damian

163

Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

– E.Coms
Nov 21 '18 at 22:24

add a comment |

Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

– E.Coms
Nov 21 '18 at 22:24

Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

– E.Coms
Nov 21 '18 at 22:24

add a comment |

1 Answer
1

active

oldest

votes

There are two steps you'll need to do:

Dequantize the input, weights and bias back into full precision (or integer equivalent)

(w-w_offset)*w_scale

After the Relu, quantize the activations back into integer

a/a_scale+a_offset

You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.

You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.

answered Dec 12 '18 at 12:39

SoonYau

612

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420994%2frun-quantized-tensorflow-model-on-fpga-pure-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There are two steps you'll need to do:

Dequantize the input, weights and bias back into full precision (or integer equivalent)

(w-w_offset)*w_scale

After the Relu, quantize the activations back into integer

a/a_scale+a_offset

You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.

You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.

answered Dec 12 '18 at 12:39

SoonYau

612

add a comment |

There are two steps you'll need to do:

Dequantize the input, weights and bias back into full precision (or integer equivalent)

(w-w_offset)*w_scale

After the Relu, quantize the activations back into integer

a/a_scale+a_offset

You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.

You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.

answered Dec 12 '18 at 12:39

SoonYau

612

add a comment |

There are two steps you'll need to do:

Dequantize the input, weights and bias back into full precision (or integer equivalent)

(w-w_offset)*w_scale

After the Relu, quantize the activations back into integer

a/a_scale+a_offset

You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.

You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.

answered Dec 12 '18 at 12:39

SoonYau

612

There are two steps you'll need to do:

Dequantize the input, weights and bias back into full precision (or integer equivalent)

(w-w_offset)*w_scale

After the Relu, quantize the activations back into integer

a/a_scale+a_offset

You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.

You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.

answered Dec 12 '18 at 12:39

SoonYau

612

answered Dec 12 '18 at 12:39

SoonYau

612

answered Dec 12 '18 at 12:39

SoonYau

612

answered Dec 12 '18 at 12:39

SoonYau

612

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky