Normalization for two bulk RNA-Seq samples to enable reliable fold-change estimation between genes












2












$begingroup$


I have two bulk RNA-Seq samples, already tpm-normalized.



I would like to know what is a reasonable normalization procedure to enable downstream log fold-change estimation.



The distribution of the two samples using the common set of genes looks similar:



TPM distribution



However, the two samples have only been tpm-normalized, is it enough to guarantee reliable fold-change estimation? Should I use another normalization procedure, e.g. Quantile Normalization, before comparison?



My objective is to define a signature using the genes that are up-regulated in Sample1 with respect to Sample0, and vice versa. I'm using log fold-changes, but I'm concerned that their value may be affected by each sample distribution.



Do you also have suggestions for the definition of up-regulated genes with these data?



scatter










share|improve this question









$endgroup$

















    2












    $begingroup$


    I have two bulk RNA-Seq samples, already tpm-normalized.



    I would like to know what is a reasonable normalization procedure to enable downstream log fold-change estimation.



    The distribution of the two samples using the common set of genes looks similar:



    TPM distribution



    However, the two samples have only been tpm-normalized, is it enough to guarantee reliable fold-change estimation? Should I use another normalization procedure, e.g. Quantile Normalization, before comparison?



    My objective is to define a signature using the genes that are up-regulated in Sample1 with respect to Sample0, and vice versa. I'm using log fold-changes, but I'm concerned that their value may be affected by each sample distribution.



    Do you also have suggestions for the definition of up-regulated genes with these data?



    scatter










    share|improve this question









    $endgroup$















      2












      2








      2





      $begingroup$


      I have two bulk RNA-Seq samples, already tpm-normalized.



      I would like to know what is a reasonable normalization procedure to enable downstream log fold-change estimation.



      The distribution of the two samples using the common set of genes looks similar:



      TPM distribution



      However, the two samples have only been tpm-normalized, is it enough to guarantee reliable fold-change estimation? Should I use another normalization procedure, e.g. Quantile Normalization, before comparison?



      My objective is to define a signature using the genes that are up-regulated in Sample1 with respect to Sample0, and vice versa. I'm using log fold-changes, but I'm concerned that their value may be affected by each sample distribution.



      Do you also have suggestions for the definition of up-regulated genes with these data?



      scatter










      share|improve this question









      $endgroup$




      I have two bulk RNA-Seq samples, already tpm-normalized.



      I would like to know what is a reasonable normalization procedure to enable downstream log fold-change estimation.



      The distribution of the two samples using the common set of genes looks similar:



      TPM distribution



      However, the two samples have only been tpm-normalized, is it enough to guarantee reliable fold-change estimation? Should I use another normalization procedure, e.g. Quantile Normalization, before comparison?



      My objective is to define a signature using the genes that are up-regulated in Sample1 with respect to Sample0, and vice versa. I'm using log fold-changes, but I'm concerned that their value may be affected by each sample distribution.



      Do you also have suggestions for the definition of up-regulated genes with these data?



      scatter







      rna-seq normalization fold-change






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 28 at 20:07









      gc5gc5

      731216




      731216






















          3 Answers
          3






          active

          oldest

          votes


















          3












          $begingroup$

          What I have generally done in the past is to process the data using voom in the limma package for bulk RNASeq. Inside voom you can call for different normalization methods to be used - "TMM" works fine for me and, is advocated by many in the field.



          voom will output an object containing the normalized expression values in a log2 scale, which, also in my experience, has worked out just fine for calculating log fold changes.



          Check out this link for more info on the package as well as normalization methods: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html
          It is a very thorough introduction to the package and all of its capabilities.



          Good luck!






          share|improve this answer









          $endgroup$





















            3












            $begingroup$

            You have only two samples?



            You aren't going to be able to draw strong conclusions from that no matter what you do. Clever statistics don't work without replicates.






            share|improve this answer









            $endgroup$





















              3












              $begingroup$

              It's not a good idea to do tpm normalisation prior to differential expression analysis, because the actual read counts are useful to determine shot noise and statistical significance. DESeq2 includes read normalisation as part of its methods for differential expression analysis.



              I think shot noise is best explained in terms of shooting photons at a target, as found on Wikipedia:




              Shot noise exists because phenomena such as light and electric current
              consist of the movement of discrete (also called "quantized")
              'packets'. Consider light—a stream of discrete photons—coming out of a
              laser pointer and hitting a wall to create a visible spot. The
              fundamental physical processes that govern light emission are such
              that these photons are emitted from the laser at random times; but the
              many billions of photons needed to create a spot are so many that the
              brightness, the number of photons per unit of time, varies only
              infinitesimally with time. However, if the laser brightness is reduced
              until only a handful of photons hit the wall every second, the
              relative fluctuations in number of photons, i.e., brightness, will be
              significant, just as when tossing a coin a few times. These
              fluctuations are shot noise.




              Sequencing also has shot noise effects because the sequencing process is a random sampling method. For low-abundance targets, the likelihood of sampling a target is low, so the effect of a single hit is amplified. This reduces the statistical significance when comparing different low-abundance expression values, because the variation in expression is highly dependent on which targets won the sampling lottery. The shot noise variation / expression curve can be seen in Figure 7 of this paper.






              share|improve this answer











              $endgroup$













              • $begingroup$
                I agree with TPM for a lot of reasons, unfortunately the data was already in TPM. Can you explain more about how read counts are useful to determine shot noise and statistical significance? Thanks
                $endgroup$
                – gc5
                Feb 28 at 22:17











              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "676"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7142%2fnormalization-for-two-bulk-rna-seq-samples-to-enable-reliable-fold-change-estima%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3












              $begingroup$

              What I have generally done in the past is to process the data using voom in the limma package for bulk RNASeq. Inside voom you can call for different normalization methods to be used - "TMM" works fine for me and, is advocated by many in the field.



              voom will output an object containing the normalized expression values in a log2 scale, which, also in my experience, has worked out just fine for calculating log fold changes.



              Check out this link for more info on the package as well as normalization methods: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html
              It is a very thorough introduction to the package and all of its capabilities.



              Good luck!






              share|improve this answer









              $endgroup$


















                3












                $begingroup$

                What I have generally done in the past is to process the data using voom in the limma package for bulk RNASeq. Inside voom you can call for different normalization methods to be used - "TMM" works fine for me and, is advocated by many in the field.



                voom will output an object containing the normalized expression values in a log2 scale, which, also in my experience, has worked out just fine for calculating log fold changes.



                Check out this link for more info on the package as well as normalization methods: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html
                It is a very thorough introduction to the package and all of its capabilities.



                Good luck!






                share|improve this answer









                $endgroup$
















                  3












                  3








                  3





                  $begingroup$

                  What I have generally done in the past is to process the data using voom in the limma package for bulk RNASeq. Inside voom you can call for different normalization methods to be used - "TMM" works fine for me and, is advocated by many in the field.



                  voom will output an object containing the normalized expression values in a log2 scale, which, also in my experience, has worked out just fine for calculating log fold changes.



                  Check out this link for more info on the package as well as normalization methods: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html
                  It is a very thorough introduction to the package and all of its capabilities.



                  Good luck!






                  share|improve this answer









                  $endgroup$



                  What I have generally done in the past is to process the data using voom in the limma package for bulk RNASeq. Inside voom you can call for different normalization methods to be used - "TMM" works fine for me and, is advocated by many in the field.



                  voom will output an object containing the normalized expression values in a log2 scale, which, also in my experience, has worked out just fine for calculating log fold changes.



                  Check out this link for more info on the package as well as normalization methods: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html
                  It is a very thorough introduction to the package and all of its capabilities.



                  Good luck!







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 28 at 21:36









                  h3ab74h3ab74

                  1187




                  1187























                      3












                      $begingroup$

                      You have only two samples?



                      You aren't going to be able to draw strong conclusions from that no matter what you do. Clever statistics don't work without replicates.






                      share|improve this answer









                      $endgroup$


















                        3












                        $begingroup$

                        You have only two samples?



                        You aren't going to be able to draw strong conclusions from that no matter what you do. Clever statistics don't work without replicates.






                        share|improve this answer









                        $endgroup$
















                          3












                          3








                          3





                          $begingroup$

                          You have only two samples?



                          You aren't going to be able to draw strong conclusions from that no matter what you do. Clever statistics don't work without replicates.






                          share|improve this answer









                          $endgroup$



                          You have only two samples?



                          You aren't going to be able to draw strong conclusions from that no matter what you do. Clever statistics don't work without replicates.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Feb 28 at 23:03









                          swbarnes2swbarnes2

                          50114




                          50114























                              3












                              $begingroup$

                              It's not a good idea to do tpm normalisation prior to differential expression analysis, because the actual read counts are useful to determine shot noise and statistical significance. DESeq2 includes read normalisation as part of its methods for differential expression analysis.



                              I think shot noise is best explained in terms of shooting photons at a target, as found on Wikipedia:




                              Shot noise exists because phenomena such as light and electric current
                              consist of the movement of discrete (also called "quantized")
                              'packets'. Consider light—a stream of discrete photons—coming out of a
                              laser pointer and hitting a wall to create a visible spot. The
                              fundamental physical processes that govern light emission are such
                              that these photons are emitted from the laser at random times; but the
                              many billions of photons needed to create a spot are so many that the
                              brightness, the number of photons per unit of time, varies only
                              infinitesimally with time. However, if the laser brightness is reduced
                              until only a handful of photons hit the wall every second, the
                              relative fluctuations in number of photons, i.e., brightness, will be
                              significant, just as when tossing a coin a few times. These
                              fluctuations are shot noise.




                              Sequencing also has shot noise effects because the sequencing process is a random sampling method. For low-abundance targets, the likelihood of sampling a target is low, so the effect of a single hit is amplified. This reduces the statistical significance when comparing different low-abundance expression values, because the variation in expression is highly dependent on which targets won the sampling lottery. The shot noise variation / expression curve can be seen in Figure 7 of this paper.






                              share|improve this answer











                              $endgroup$













                              • $begingroup$
                                I agree with TPM for a lot of reasons, unfortunately the data was already in TPM. Can you explain more about how read counts are useful to determine shot noise and statistical significance? Thanks
                                $endgroup$
                                – gc5
                                Feb 28 at 22:17
















                              3












                              $begingroup$

                              It's not a good idea to do tpm normalisation prior to differential expression analysis, because the actual read counts are useful to determine shot noise and statistical significance. DESeq2 includes read normalisation as part of its methods for differential expression analysis.



                              I think shot noise is best explained in terms of shooting photons at a target, as found on Wikipedia:




                              Shot noise exists because phenomena such as light and electric current
                              consist of the movement of discrete (also called "quantized")
                              'packets'. Consider light—a stream of discrete photons—coming out of a
                              laser pointer and hitting a wall to create a visible spot. The
                              fundamental physical processes that govern light emission are such
                              that these photons are emitted from the laser at random times; but the
                              many billions of photons needed to create a spot are so many that the
                              brightness, the number of photons per unit of time, varies only
                              infinitesimally with time. However, if the laser brightness is reduced
                              until only a handful of photons hit the wall every second, the
                              relative fluctuations in number of photons, i.e., brightness, will be
                              significant, just as when tossing a coin a few times. These
                              fluctuations are shot noise.




                              Sequencing also has shot noise effects because the sequencing process is a random sampling method. For low-abundance targets, the likelihood of sampling a target is low, so the effect of a single hit is amplified. This reduces the statistical significance when comparing different low-abundance expression values, because the variation in expression is highly dependent on which targets won the sampling lottery. The shot noise variation / expression curve can be seen in Figure 7 of this paper.






                              share|improve this answer











                              $endgroup$













                              • $begingroup$
                                I agree with TPM for a lot of reasons, unfortunately the data was already in TPM. Can you explain more about how read counts are useful to determine shot noise and statistical significance? Thanks
                                $endgroup$
                                – gc5
                                Feb 28 at 22:17














                              3












                              3








                              3





                              $begingroup$

                              It's not a good idea to do tpm normalisation prior to differential expression analysis, because the actual read counts are useful to determine shot noise and statistical significance. DESeq2 includes read normalisation as part of its methods for differential expression analysis.



                              I think shot noise is best explained in terms of shooting photons at a target, as found on Wikipedia:




                              Shot noise exists because phenomena such as light and electric current
                              consist of the movement of discrete (also called "quantized")
                              'packets'. Consider light—a stream of discrete photons—coming out of a
                              laser pointer and hitting a wall to create a visible spot. The
                              fundamental physical processes that govern light emission are such
                              that these photons are emitted from the laser at random times; but the
                              many billions of photons needed to create a spot are so many that the
                              brightness, the number of photons per unit of time, varies only
                              infinitesimally with time. However, if the laser brightness is reduced
                              until only a handful of photons hit the wall every second, the
                              relative fluctuations in number of photons, i.e., brightness, will be
                              significant, just as when tossing a coin a few times. These
                              fluctuations are shot noise.




                              Sequencing also has shot noise effects because the sequencing process is a random sampling method. For low-abundance targets, the likelihood of sampling a target is low, so the effect of a single hit is amplified. This reduces the statistical significance when comparing different low-abundance expression values, because the variation in expression is highly dependent on which targets won the sampling lottery. The shot noise variation / expression curve can be seen in Figure 7 of this paper.






                              share|improve this answer











                              $endgroup$



                              It's not a good idea to do tpm normalisation prior to differential expression analysis, because the actual read counts are useful to determine shot noise and statistical significance. DESeq2 includes read normalisation as part of its methods for differential expression analysis.



                              I think shot noise is best explained in terms of shooting photons at a target, as found on Wikipedia:




                              Shot noise exists because phenomena such as light and electric current
                              consist of the movement of discrete (also called "quantized")
                              'packets'. Consider light—a stream of discrete photons—coming out of a
                              laser pointer and hitting a wall to create a visible spot. The
                              fundamental physical processes that govern light emission are such
                              that these photons are emitted from the laser at random times; but the
                              many billions of photons needed to create a spot are so many that the
                              brightness, the number of photons per unit of time, varies only
                              infinitesimally with time. However, if the laser brightness is reduced
                              until only a handful of photons hit the wall every second, the
                              relative fluctuations in number of photons, i.e., brightness, will be
                              significant, just as when tossing a coin a few times. These
                              fluctuations are shot noise.




                              Sequencing also has shot noise effects because the sequencing process is a random sampling method. For low-abundance targets, the likelihood of sampling a target is low, so the effect of a single hit is amplified. This reduces the statistical significance when comparing different low-abundance expression values, because the variation in expression is highly dependent on which targets won the sampling lottery. The shot noise variation / expression curve can be seen in Figure 7 of this paper.







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Mar 1 at 21:56

























                              answered Feb 28 at 21:35









                              gringergringer

                              7,86221049




                              7,86221049












                              • $begingroup$
                                I agree with TPM for a lot of reasons, unfortunately the data was already in TPM. Can you explain more about how read counts are useful to determine shot noise and statistical significance? Thanks
                                $endgroup$
                                – gc5
                                Feb 28 at 22:17


















                              • $begingroup$
                                I agree with TPM for a lot of reasons, unfortunately the data was already in TPM. Can you explain more about how read counts are useful to determine shot noise and statistical significance? Thanks
                                $endgroup$
                                – gc5
                                Feb 28 at 22:17
















                              $begingroup$
                              I agree with TPM for a lot of reasons, unfortunately the data was already in TPM. Can you explain more about how read counts are useful to determine shot noise and statistical significance? Thanks
                              $endgroup$
                              – gc5
                              Feb 28 at 22:17




                              $begingroup$
                              I agree with TPM for a lot of reasons, unfortunately the data was already in TPM. Can you explain more about how read counts are useful to determine shot noise and statistical significance? Thanks
                              $endgroup$
                              – gc5
                              Feb 28 at 22:17


















                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Bioinformatics Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7142%2fnormalization-for-two-bulk-rna-seq-samples-to-enable-reliable-fold-change-estima%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to change which sound is reproduced for terminal bell?

                              Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

                              Can I use Tabulator js library in my java Spring + Thymeleaf project?