Run quantized tensorflow model on FPGA / pure python












3















I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?










share|improve this question

























  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

    – E.Coms
    Nov 21 '18 at 22:24
















3















I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?










share|improve this question

























  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

    – E.Coms
    Nov 21 '18 at 22:24














3












3








3








I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?










share|improve this question
















I have a model trained in keras which is a simple model trained on MNIST dataset.



What I try to do is to rewrite this model and run on FPGA device.
In order to do this I want to fully understand how quantized model works.



First I converted this model with post training quantization to .tflite format and UINT8 precision (https://www.tensorflow.org/lite/performance/post_training_quantization).



So I have quantized model and accuracy is about 90%.



Now I try to get weights from quantized model and implement it in a pure python. I use this tool for visualization and to get model weights: https://github.com/lutzroeder/netron.



Although simple python code (matrix multiplication, add bias and relu) works, the one with quantized weights doesn't work.



So my question is how to write a feed forward using numpy?



My model in keras looks like this:



model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation(tf.nn.relu))
model.add(Dense(100))
model.add(Activation(tf.nn.relu))
model.add(Dense(num_classes))
model.add(Activation(tf.nn.softmax))
model.compile(
optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'],
)


I converted it with TocoConverter. And it works in tensorflow.



Then I try to write feed forward in pure python:



for img, label in zip(x_test, y_test):
img = img.astype('uint8')
total_seen += 1
label = tf.keras.utils.to_categorical(label, num_classes=num_classes)
X = img.reshape(1, 784)
z1 = np.dot(X, W0.T) + b0
a1 = relu(z1)
z2 = np.dot(a1, W1.T) + b1
a2 = relu(z2)
z3 = np.dot(a2, W2.T) + b2
prediction = np.argmax(z3)
label = np.argmax(label)
if prediction == label:
num_correct += 1


But this model accuracy is about 10%, so something goes wrong.
How to correct this model?



Thanks in advance.



Edit:
I've read paper about quantization in tensorflow:
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf



And I know almost everything, I know what are S and Z values for activations and kernels. But after matrix multiplication it should be multiplied by factor: M :=S1*S2/S3.
And i don't know what is S3 scale and how to get it. Because i can't see anything related in netron graph. Any suggestion?







python tensorflow deep-learning tensorflow-lite quantization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 21:51







Damian

















asked Nov 21 '18 at 21:54









DamianDamian

163




163













  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

    – E.Coms
    Nov 21 '18 at 22:24



















  • Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

    – E.Coms
    Nov 21 '18 at 22:24

















Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

– E.Coms
Nov 21 '18 at 22:24





Please add the weight code you try. Even better adding some simple examples so that people can see where the problem lies at.

– E.Coms
Nov 21 '18 at 22:24












1 Answer
1






active

oldest

votes


















0














There are two steps you'll need to do:





  1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



    (w-w_offset)*w_scale




  2. After the Relu, quantize the activations back into integer



    a/a_scale+a_offset



    You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420994%2frun-quantized-tensorflow-model-on-fpga-pure-python%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    There are two steps you'll need to do:





    1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



      (w-w_offset)*w_scale




    2. After the Relu, quantize the activations back into integer



      a/a_scale+a_offset



      You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




    You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






    share|improve this answer




























      0














      There are two steps you'll need to do:





      1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



        (w-w_offset)*w_scale




      2. After the Relu, quantize the activations back into integer



        a/a_scale+a_offset



        You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




      You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






      share|improve this answer


























        0












        0








        0







        There are two steps you'll need to do:





        1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



          (w-w_offset)*w_scale




        2. After the Relu, quantize the activations back into integer



          a/a_scale+a_offset



          You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




        You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.






        share|improve this answer













        There are two steps you'll need to do:





        1. Dequantize the input, weights and bias back into full precision (or integer equivalent)



          (w-w_offset)*w_scale




        2. After the Relu, quantize the activations back into integer



          a/a_scale+a_offset



          You can probably skip step 2 that quantize-dequantize the activations with minor risk of getting different result as TFlite model. This is because Relu has no upper bound but TFlite will saturate it to a maximum value.




        You can check out my tutorials on TFlite in my Github where I have introduced the concept and training and is about to write out about inference.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 12 '18 at 12:39









        SoonYauSoonYau

        612




        612
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420994%2frun-quantized-tensorflow-model-on-fpga-pure-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to change which sound is reproduced for terminal bell?

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

            Can I use Tabulator js library in my java Spring + Thymeleaf project?