Why is opening a file faster than reading variable content?












34















In a bash script I need various values from /proc/ files. Until now I have dozens of lines grepping the files directly like that:



grep -oP '^MemFree: *K[0-9]+' /proc/meminfo


In an effort to make that more efficient I saved the file content in a variable and grepped that:



a=$(</proc/meminfo)
echo "$a" | grep -oP '^MemFree: *K[0-9]+'


Instead of opening the file multiple times this should just open it once and grep the variable content, which I assumed would be faster – but in fact it is slower:





bash 4.4.19 $ time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null
real 0m0.803s
user 0m0.619s
sys 0m0.232s
bash 4.4.19 $ a=$(</proc/meminfo)
bash 4.4.19 $ time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null
real 0m1.182s
user 0m1.425s
sys 0m0.506s


The same is true for dash and zsh. I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:



bash 4.4.19 $ cat </proc/meminfo >meminfo
bash 4.4.19 $ time for i in $(seq 1 1000);do grep ^MemFree meminfo; done >/dev/null
real 0m0.790s
user 0m0.608s
sys 0m0.227s


Using a here string to save the pipe makes it slightly faster, but still not as fast as with the files:



bash 4.4.19 $ time for i in $(seq 1 1000);do <<<"$a" grep ^MemFree; done >/dev/null
real 0m0.977s
user 0m0.758s
sys 0m0.268s


Why is opening a file faster than reading the same content from a variable?










share|improve this question

























  • @l0b0 This assumption is not faulty, the question shows how I came up with it and the answers explain why this is the case. Your edit now makes the answers not answering the title question any more: They don’t say whether that’s the case.

    – dessert
    Feb 21 at 11:59











  • OK, clarified. Because the heading was wrong in the vast majority of cases, just not for certain memory mapped special files.

    – l0b0
    Feb 21 at 18:25











  • @l0b0 No, that’s what I’m asking here: “I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:” It is not special to /proc/ files, reading regular files is faster as well!

    – dessert
    Feb 21 at 19:34
















34















In a bash script I need various values from /proc/ files. Until now I have dozens of lines grepping the files directly like that:



grep -oP '^MemFree: *K[0-9]+' /proc/meminfo


In an effort to make that more efficient I saved the file content in a variable and grepped that:



a=$(</proc/meminfo)
echo "$a" | grep -oP '^MemFree: *K[0-9]+'


Instead of opening the file multiple times this should just open it once and grep the variable content, which I assumed would be faster – but in fact it is slower:





bash 4.4.19 $ time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null
real 0m0.803s
user 0m0.619s
sys 0m0.232s
bash 4.4.19 $ a=$(</proc/meminfo)
bash 4.4.19 $ time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null
real 0m1.182s
user 0m1.425s
sys 0m0.506s


The same is true for dash and zsh. I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:



bash 4.4.19 $ cat </proc/meminfo >meminfo
bash 4.4.19 $ time for i in $(seq 1 1000);do grep ^MemFree meminfo; done >/dev/null
real 0m0.790s
user 0m0.608s
sys 0m0.227s


Using a here string to save the pipe makes it slightly faster, but still not as fast as with the files:



bash 4.4.19 $ time for i in $(seq 1 1000);do <<<"$a" grep ^MemFree; done >/dev/null
real 0m0.977s
user 0m0.758s
sys 0m0.268s


Why is opening a file faster than reading the same content from a variable?










share|improve this question

























  • @l0b0 This assumption is not faulty, the question shows how I came up with it and the answers explain why this is the case. Your edit now makes the answers not answering the title question any more: They don’t say whether that’s the case.

    – dessert
    Feb 21 at 11:59











  • OK, clarified. Because the heading was wrong in the vast majority of cases, just not for certain memory mapped special files.

    – l0b0
    Feb 21 at 18:25











  • @l0b0 No, that’s what I’m asking here: “I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:” It is not special to /proc/ files, reading regular files is faster as well!

    – dessert
    Feb 21 at 19:34














34












34








34


8






In a bash script I need various values from /proc/ files. Until now I have dozens of lines grepping the files directly like that:



grep -oP '^MemFree: *K[0-9]+' /proc/meminfo


In an effort to make that more efficient I saved the file content in a variable and grepped that:



a=$(</proc/meminfo)
echo "$a" | grep -oP '^MemFree: *K[0-9]+'


Instead of opening the file multiple times this should just open it once and grep the variable content, which I assumed would be faster – but in fact it is slower:





bash 4.4.19 $ time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null
real 0m0.803s
user 0m0.619s
sys 0m0.232s
bash 4.4.19 $ a=$(</proc/meminfo)
bash 4.4.19 $ time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null
real 0m1.182s
user 0m1.425s
sys 0m0.506s


The same is true for dash and zsh. I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:



bash 4.4.19 $ cat </proc/meminfo >meminfo
bash 4.4.19 $ time for i in $(seq 1 1000);do grep ^MemFree meminfo; done >/dev/null
real 0m0.790s
user 0m0.608s
sys 0m0.227s


Using a here string to save the pipe makes it slightly faster, but still not as fast as with the files:



bash 4.4.19 $ time for i in $(seq 1 1000);do <<<"$a" grep ^MemFree; done >/dev/null
real 0m0.977s
user 0m0.758s
sys 0m0.268s


Why is opening a file faster than reading the same content from a variable?










share|improve this question
















In a bash script I need various values from /proc/ files. Until now I have dozens of lines grepping the files directly like that:



grep -oP '^MemFree: *K[0-9]+' /proc/meminfo


In an effort to make that more efficient I saved the file content in a variable and grepped that:



a=$(</proc/meminfo)
echo "$a" | grep -oP '^MemFree: *K[0-9]+'


Instead of opening the file multiple times this should just open it once and grep the variable content, which I assumed would be faster – but in fact it is slower:





bash 4.4.19 $ time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null
real 0m0.803s
user 0m0.619s
sys 0m0.232s
bash 4.4.19 $ a=$(</proc/meminfo)
bash 4.4.19 $ time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null
real 0m1.182s
user 0m1.425s
sys 0m0.506s


The same is true for dash and zsh. I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:



bash 4.4.19 $ cat </proc/meminfo >meminfo
bash 4.4.19 $ time for i in $(seq 1 1000);do grep ^MemFree meminfo; done >/dev/null
real 0m0.790s
user 0m0.608s
sys 0m0.227s


Using a here string to save the pipe makes it slightly faster, but still not as fast as with the files:



bash 4.4.19 $ time for i in $(seq 1 1000);do <<<"$a" grep ^MemFree; done >/dev/null
real 0m0.977s
user 0m0.758s
sys 0m0.268s


Why is opening a file faster than reading the same content from a variable?







bash shell-script shell zsh variable






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 21 at 19:35







dessert

















asked Feb 20 at 11:32









dessertdessert

1,248623




1,248623













  • @l0b0 This assumption is not faulty, the question shows how I came up with it and the answers explain why this is the case. Your edit now makes the answers not answering the title question any more: They don’t say whether that’s the case.

    – dessert
    Feb 21 at 11:59











  • OK, clarified. Because the heading was wrong in the vast majority of cases, just not for certain memory mapped special files.

    – l0b0
    Feb 21 at 18:25











  • @l0b0 No, that’s what I’m asking here: “I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:” It is not special to /proc/ files, reading regular files is faster as well!

    – dessert
    Feb 21 at 19:34



















  • @l0b0 This assumption is not faulty, the question shows how I came up with it and the answers explain why this is the case. Your edit now makes the answers not answering the title question any more: They don’t say whether that’s the case.

    – dessert
    Feb 21 at 11:59











  • OK, clarified. Because the heading was wrong in the vast majority of cases, just not for certain memory mapped special files.

    – l0b0
    Feb 21 at 18:25











  • @l0b0 No, that’s what I’m asking here: “I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:” It is not special to /proc/ files, reading regular files is faster as well!

    – dessert
    Feb 21 at 19:34

















@l0b0 This assumption is not faulty, the question shows how I came up with it and the answers explain why this is the case. Your edit now makes the answers not answering the title question any more: They don’t say whether that’s the case.

– dessert
Feb 21 at 11:59





@l0b0 This assumption is not faulty, the question shows how I came up with it and the answers explain why this is the case. Your edit now makes the answers not answering the title question any more: They don’t say whether that’s the case.

– dessert
Feb 21 at 11:59













OK, clarified. Because the heading was wrong in the vast majority of cases, just not for certain memory mapped special files.

– l0b0
Feb 21 at 18:25





OK, clarified. Because the heading was wrong in the vast majority of cases, just not for certain memory mapped special files.

– l0b0
Feb 21 at 18:25













@l0b0 No, that’s what I’m asking here: “I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:” It is not special to /proc/ files, reading regular files is faster as well!

– dessert
Feb 21 at 19:34





@l0b0 No, that’s what I’m asking here: “I suspected the special state of /proc/ files as a reason, but when I copy the content of /proc/meminfo to a regular file and use that the results are the same:” It is not special to /proc/ files, reading regular files is faster as well!

– dessert
Feb 21 at 19:34










3 Answers
3






active

oldest

votes


















45














Here, it's not about opening a file versus reading a variable's content but more about forking an extra process or not.



grep -oP '^MemFree: *K[0-9]+' /proc/meminfo forks a process that executes grep that opens /proc/meminfo (a virtual file, in memory, no disk I/O involved) reads it and matches the regexp.



The most expensive part in that is forking the process and loading the grep utility and its library dependencies, doing the dynamic linking, open the locale database, dozens of files that are on disk (but likely cached in memory).



The part about reading /proc/meminfo is insignificant in comparison, the kernel needs little time to generate the information in there and grep needs little time to read it.



If you run strace -c on that, you'll see the one open() and one read() systems calls used to read /proc/meminfo is peanuts compared to everything else grep does to start (strace -c doesn't count the forking).



In:



a=$(</proc/meminfo)


In most shells that support that $(<...) ksh operator, the shell just opens the file and read its content (and strips the trailing newline characters). bash is different and much less efficient in that it forks a process to do that reading and passes the data to the parent via a pipe. But here, it's done once so it doesn't matter.



In:



printf '%sn' "$a" | grep '^MemFree'


The shell needs to spawn two processes, which are running concurrently but interact between each other via a pipe. That pipe creation, tearing down, and writing and reading from it has some little cost. The much greater cost is the spawning of an extra process. The scheduling of the processes has some impact as well.



You may find that using the zsh <<< operator makes it slightly quicker:



grep '^MemFree' <<< "$a"


In zsh and bash, that's done by writing the content of $a in a temporary file, that is less expensive than spawning an extra process, but will probably not give you any gain compared to getting the data straight off /proc/meminfo. That's still less efficient than your approach that copies /proc/meminfo on disk, as the writing of the temp file is done at each iteration.



dash doesn't support here-strings, but its heredocs are implemented with a pipe that doesn't involve spawning an extra process. In:



 grep '^MemFree' << EOF
$a
EOF


The shell creates a pipe, forks a process. The child executes grep with its stdin as the reading end of the pipe, and the parent writes the content at the other end of the pipe.



But that pipe handling and process synchronisation is still likely to be more expensive than just getting the data straight off /proc/meminfo.



The content of /proc/meminfo is short and takes not much time to produce. If you want to save some CPU cycles, you want to remove the expensive parts: forking processes and running external commands.



Like:



IFS= read -rd '' meminfo < /proc/meminfo
memfree=${meminfo#*MemFree:}
memfree=${memfree%%$'n'*}
memfree=${memfree#"${memfree%%[! ]*}"}


Avoid bash though whose pattern matching is very ineficient. With zsh -o extendedglob, you can shorten it to:



memfree=${${"$(</proc/meminfo)"##*MemFree: #}%%$'n'*}


Note that ^ is special in many shells (Bourne, fish, rc, es and zsh with the extendedglob option at least), I'd recommend quoting it. Also note that echo can't be used to output arbitrary data (hence my use of printf above).






share|improve this answer





















  • 4





    In the case with printf you say the shell needs to spawn two processes, but isn't printf a shell builtin?

    – David Conrad
    Feb 20 at 17:56






  • 6





    @DavidConrad It is, but most shells don't try to analyze the pipeline for which parts it could run in the current process. It just forks itself and lets the children figure it out. In this case, the parent process forks twice; the child for the left side then sees a built-in and executes it; the child for the right side sees grep and execs.

    – chepner
    Feb 20 at 18:47






  • 1





    @DavidConrad, the pipe is an IPC mechanism, so in any case the two sides will have to run in different processes. While in A | B, there are some shells like AT&T ksh or zsh that run B in the current shell process if it's a builtin or compound or function command, I don't know of any that runs A in the current process. If anything, to do that, they would have to handle SIGPIPE in a complex way as if A was running in child process and without terminating the shell for the behaviour not to be too surprising when B exits early. It's much easier to run B in the parent process.

    – Stéphane Chazelas
    Feb 21 at 6:15











  • Bash supports <<<

    – D. Ben Knoble
    Feb 21 at 17:48






  • 1





    @D.BenKnoble, I didn't mean to imply bash didn't support <<<, just that the operator came from zsh like $(<...) came from ksh.

    – Stéphane Chazelas
    Feb 21 at 20:36



















6














In your first case you are just using grep utility and finding something from file /proc/meminfo, /proc is a virtual file system so /proc/meminfo file is in the memory, and it requires very little time to fetch its content.



But in the second case, you are creating a pipe, then passing the first command's output to the second command using this pipe, which is costly.



The difference is because of /proc (because it is in memory) and pipe, see the example below:



time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null

real 0m0.914s
user 0m0.032s
sys 0m0.148s


cat /proc/meminfo > file
time for i in {1..1000};do grep ^MemFree file;done >/dev/null

real 0m0.938s
user 0m0.032s
sys 0m0.152s


time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null

real 0m1.016s
user 0m0.040s
sys 0m0.232s





share|improve this answer

































    1














    You are calling an external command in both cases (grep).
    The external call require a subshell. Forking that shell is the fundamental cause for the delay. Both cases are similar, thus: a similar delay.



    If you want to read the external file only once and use it (from a variable) multiple times don't go out of the shell:



    meminfo=$(< /dev/meminfo)    
    time for i in {1..1000};do
    [[ $meminfo =~ MemFree: *([0-9]*) *.B ]]
    printf '%sn' "${BASH_REMATCH[1]}"
    done


    Which takes only about 0.1 seconds instead of the full 1 second for the grep call.






    share|improve this answer























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f501828%2fwhy-is-opening-a-file-faster-than-reading-variable-content%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      45














      Here, it's not about opening a file versus reading a variable's content but more about forking an extra process or not.



      grep -oP '^MemFree: *K[0-9]+' /proc/meminfo forks a process that executes grep that opens /proc/meminfo (a virtual file, in memory, no disk I/O involved) reads it and matches the regexp.



      The most expensive part in that is forking the process and loading the grep utility and its library dependencies, doing the dynamic linking, open the locale database, dozens of files that are on disk (but likely cached in memory).



      The part about reading /proc/meminfo is insignificant in comparison, the kernel needs little time to generate the information in there and grep needs little time to read it.



      If you run strace -c on that, you'll see the one open() and one read() systems calls used to read /proc/meminfo is peanuts compared to everything else grep does to start (strace -c doesn't count the forking).



      In:



      a=$(</proc/meminfo)


      In most shells that support that $(<...) ksh operator, the shell just opens the file and read its content (and strips the trailing newline characters). bash is different and much less efficient in that it forks a process to do that reading and passes the data to the parent via a pipe. But here, it's done once so it doesn't matter.



      In:



      printf '%sn' "$a" | grep '^MemFree'


      The shell needs to spawn two processes, which are running concurrently but interact between each other via a pipe. That pipe creation, tearing down, and writing and reading from it has some little cost. The much greater cost is the spawning of an extra process. The scheduling of the processes has some impact as well.



      You may find that using the zsh <<< operator makes it slightly quicker:



      grep '^MemFree' <<< "$a"


      In zsh and bash, that's done by writing the content of $a in a temporary file, that is less expensive than spawning an extra process, but will probably not give you any gain compared to getting the data straight off /proc/meminfo. That's still less efficient than your approach that copies /proc/meminfo on disk, as the writing of the temp file is done at each iteration.



      dash doesn't support here-strings, but its heredocs are implemented with a pipe that doesn't involve spawning an extra process. In:



       grep '^MemFree' << EOF
      $a
      EOF


      The shell creates a pipe, forks a process. The child executes grep with its stdin as the reading end of the pipe, and the parent writes the content at the other end of the pipe.



      But that pipe handling and process synchronisation is still likely to be more expensive than just getting the data straight off /proc/meminfo.



      The content of /proc/meminfo is short and takes not much time to produce. If you want to save some CPU cycles, you want to remove the expensive parts: forking processes and running external commands.



      Like:



      IFS= read -rd '' meminfo < /proc/meminfo
      memfree=${meminfo#*MemFree:}
      memfree=${memfree%%$'n'*}
      memfree=${memfree#"${memfree%%[! ]*}"}


      Avoid bash though whose pattern matching is very ineficient. With zsh -o extendedglob, you can shorten it to:



      memfree=${${"$(</proc/meminfo)"##*MemFree: #}%%$'n'*}


      Note that ^ is special in many shells (Bourne, fish, rc, es and zsh with the extendedglob option at least), I'd recommend quoting it. Also note that echo can't be used to output arbitrary data (hence my use of printf above).






      share|improve this answer





















      • 4





        In the case with printf you say the shell needs to spawn two processes, but isn't printf a shell builtin?

        – David Conrad
        Feb 20 at 17:56






      • 6





        @DavidConrad It is, but most shells don't try to analyze the pipeline for which parts it could run in the current process. It just forks itself and lets the children figure it out. In this case, the parent process forks twice; the child for the left side then sees a built-in and executes it; the child for the right side sees grep and execs.

        – chepner
        Feb 20 at 18:47






      • 1





        @DavidConrad, the pipe is an IPC mechanism, so in any case the two sides will have to run in different processes. While in A | B, there are some shells like AT&T ksh or zsh that run B in the current shell process if it's a builtin or compound or function command, I don't know of any that runs A in the current process. If anything, to do that, they would have to handle SIGPIPE in a complex way as if A was running in child process and without terminating the shell for the behaviour not to be too surprising when B exits early. It's much easier to run B in the parent process.

        – Stéphane Chazelas
        Feb 21 at 6:15











      • Bash supports <<<

        – D. Ben Knoble
        Feb 21 at 17:48






      • 1





        @D.BenKnoble, I didn't mean to imply bash didn't support <<<, just that the operator came from zsh like $(<...) came from ksh.

        – Stéphane Chazelas
        Feb 21 at 20:36
















      45














      Here, it's not about opening a file versus reading a variable's content but more about forking an extra process or not.



      grep -oP '^MemFree: *K[0-9]+' /proc/meminfo forks a process that executes grep that opens /proc/meminfo (a virtual file, in memory, no disk I/O involved) reads it and matches the regexp.



      The most expensive part in that is forking the process and loading the grep utility and its library dependencies, doing the dynamic linking, open the locale database, dozens of files that are on disk (but likely cached in memory).



      The part about reading /proc/meminfo is insignificant in comparison, the kernel needs little time to generate the information in there and grep needs little time to read it.



      If you run strace -c on that, you'll see the one open() and one read() systems calls used to read /proc/meminfo is peanuts compared to everything else grep does to start (strace -c doesn't count the forking).



      In:



      a=$(</proc/meminfo)


      In most shells that support that $(<...) ksh operator, the shell just opens the file and read its content (and strips the trailing newline characters). bash is different and much less efficient in that it forks a process to do that reading and passes the data to the parent via a pipe. But here, it's done once so it doesn't matter.



      In:



      printf '%sn' "$a" | grep '^MemFree'


      The shell needs to spawn two processes, which are running concurrently but interact between each other via a pipe. That pipe creation, tearing down, and writing and reading from it has some little cost. The much greater cost is the spawning of an extra process. The scheduling of the processes has some impact as well.



      You may find that using the zsh <<< operator makes it slightly quicker:



      grep '^MemFree' <<< "$a"


      In zsh and bash, that's done by writing the content of $a in a temporary file, that is less expensive than spawning an extra process, but will probably not give you any gain compared to getting the data straight off /proc/meminfo. That's still less efficient than your approach that copies /proc/meminfo on disk, as the writing of the temp file is done at each iteration.



      dash doesn't support here-strings, but its heredocs are implemented with a pipe that doesn't involve spawning an extra process. In:



       grep '^MemFree' << EOF
      $a
      EOF


      The shell creates a pipe, forks a process. The child executes grep with its stdin as the reading end of the pipe, and the parent writes the content at the other end of the pipe.



      But that pipe handling and process synchronisation is still likely to be more expensive than just getting the data straight off /proc/meminfo.



      The content of /proc/meminfo is short and takes not much time to produce. If you want to save some CPU cycles, you want to remove the expensive parts: forking processes and running external commands.



      Like:



      IFS= read -rd '' meminfo < /proc/meminfo
      memfree=${meminfo#*MemFree:}
      memfree=${memfree%%$'n'*}
      memfree=${memfree#"${memfree%%[! ]*}"}


      Avoid bash though whose pattern matching is very ineficient. With zsh -o extendedglob, you can shorten it to:



      memfree=${${"$(</proc/meminfo)"##*MemFree: #}%%$'n'*}


      Note that ^ is special in many shells (Bourne, fish, rc, es and zsh with the extendedglob option at least), I'd recommend quoting it. Also note that echo can't be used to output arbitrary data (hence my use of printf above).






      share|improve this answer





















      • 4





        In the case with printf you say the shell needs to spawn two processes, but isn't printf a shell builtin?

        – David Conrad
        Feb 20 at 17:56






      • 6





        @DavidConrad It is, but most shells don't try to analyze the pipeline for which parts it could run in the current process. It just forks itself and lets the children figure it out. In this case, the parent process forks twice; the child for the left side then sees a built-in and executes it; the child for the right side sees grep and execs.

        – chepner
        Feb 20 at 18:47






      • 1





        @DavidConrad, the pipe is an IPC mechanism, so in any case the two sides will have to run in different processes. While in A | B, there are some shells like AT&T ksh or zsh that run B in the current shell process if it's a builtin or compound or function command, I don't know of any that runs A in the current process. If anything, to do that, they would have to handle SIGPIPE in a complex way as if A was running in child process and without terminating the shell for the behaviour not to be too surprising when B exits early. It's much easier to run B in the parent process.

        – Stéphane Chazelas
        Feb 21 at 6:15











      • Bash supports <<<

        – D. Ben Knoble
        Feb 21 at 17:48






      • 1





        @D.BenKnoble, I didn't mean to imply bash didn't support <<<, just that the operator came from zsh like $(<...) came from ksh.

        – Stéphane Chazelas
        Feb 21 at 20:36














      45












      45








      45







      Here, it's not about opening a file versus reading a variable's content but more about forking an extra process or not.



      grep -oP '^MemFree: *K[0-9]+' /proc/meminfo forks a process that executes grep that opens /proc/meminfo (a virtual file, in memory, no disk I/O involved) reads it and matches the regexp.



      The most expensive part in that is forking the process and loading the grep utility and its library dependencies, doing the dynamic linking, open the locale database, dozens of files that are on disk (but likely cached in memory).



      The part about reading /proc/meminfo is insignificant in comparison, the kernel needs little time to generate the information in there and grep needs little time to read it.



      If you run strace -c on that, you'll see the one open() and one read() systems calls used to read /proc/meminfo is peanuts compared to everything else grep does to start (strace -c doesn't count the forking).



      In:



      a=$(</proc/meminfo)


      In most shells that support that $(<...) ksh operator, the shell just opens the file and read its content (and strips the trailing newline characters). bash is different and much less efficient in that it forks a process to do that reading and passes the data to the parent via a pipe. But here, it's done once so it doesn't matter.



      In:



      printf '%sn' "$a" | grep '^MemFree'


      The shell needs to spawn two processes, which are running concurrently but interact between each other via a pipe. That pipe creation, tearing down, and writing and reading from it has some little cost. The much greater cost is the spawning of an extra process. The scheduling of the processes has some impact as well.



      You may find that using the zsh <<< operator makes it slightly quicker:



      grep '^MemFree' <<< "$a"


      In zsh and bash, that's done by writing the content of $a in a temporary file, that is less expensive than spawning an extra process, but will probably not give you any gain compared to getting the data straight off /proc/meminfo. That's still less efficient than your approach that copies /proc/meminfo on disk, as the writing of the temp file is done at each iteration.



      dash doesn't support here-strings, but its heredocs are implemented with a pipe that doesn't involve spawning an extra process. In:



       grep '^MemFree' << EOF
      $a
      EOF


      The shell creates a pipe, forks a process. The child executes grep with its stdin as the reading end of the pipe, and the parent writes the content at the other end of the pipe.



      But that pipe handling and process synchronisation is still likely to be more expensive than just getting the data straight off /proc/meminfo.



      The content of /proc/meminfo is short and takes not much time to produce. If you want to save some CPU cycles, you want to remove the expensive parts: forking processes and running external commands.



      Like:



      IFS= read -rd '' meminfo < /proc/meminfo
      memfree=${meminfo#*MemFree:}
      memfree=${memfree%%$'n'*}
      memfree=${memfree#"${memfree%%[! ]*}"}


      Avoid bash though whose pattern matching is very ineficient. With zsh -o extendedglob, you can shorten it to:



      memfree=${${"$(</proc/meminfo)"##*MemFree: #}%%$'n'*}


      Note that ^ is special in many shells (Bourne, fish, rc, es and zsh with the extendedglob option at least), I'd recommend quoting it. Also note that echo can't be used to output arbitrary data (hence my use of printf above).






      share|improve this answer















      Here, it's not about opening a file versus reading a variable's content but more about forking an extra process or not.



      grep -oP '^MemFree: *K[0-9]+' /proc/meminfo forks a process that executes grep that opens /proc/meminfo (a virtual file, in memory, no disk I/O involved) reads it and matches the regexp.



      The most expensive part in that is forking the process and loading the grep utility and its library dependencies, doing the dynamic linking, open the locale database, dozens of files that are on disk (but likely cached in memory).



      The part about reading /proc/meminfo is insignificant in comparison, the kernel needs little time to generate the information in there and grep needs little time to read it.



      If you run strace -c on that, you'll see the one open() and one read() systems calls used to read /proc/meminfo is peanuts compared to everything else grep does to start (strace -c doesn't count the forking).



      In:



      a=$(</proc/meminfo)


      In most shells that support that $(<...) ksh operator, the shell just opens the file and read its content (and strips the trailing newline characters). bash is different and much less efficient in that it forks a process to do that reading and passes the data to the parent via a pipe. But here, it's done once so it doesn't matter.



      In:



      printf '%sn' "$a" | grep '^MemFree'


      The shell needs to spawn two processes, which are running concurrently but interact between each other via a pipe. That pipe creation, tearing down, and writing and reading from it has some little cost. The much greater cost is the spawning of an extra process. The scheduling of the processes has some impact as well.



      You may find that using the zsh <<< operator makes it slightly quicker:



      grep '^MemFree' <<< "$a"


      In zsh and bash, that's done by writing the content of $a in a temporary file, that is less expensive than spawning an extra process, but will probably not give you any gain compared to getting the data straight off /proc/meminfo. That's still less efficient than your approach that copies /proc/meminfo on disk, as the writing of the temp file is done at each iteration.



      dash doesn't support here-strings, but its heredocs are implemented with a pipe that doesn't involve spawning an extra process. In:



       grep '^MemFree' << EOF
      $a
      EOF


      The shell creates a pipe, forks a process. The child executes grep with its stdin as the reading end of the pipe, and the parent writes the content at the other end of the pipe.



      But that pipe handling and process synchronisation is still likely to be more expensive than just getting the data straight off /proc/meminfo.



      The content of /proc/meminfo is short and takes not much time to produce. If you want to save some CPU cycles, you want to remove the expensive parts: forking processes and running external commands.



      Like:



      IFS= read -rd '' meminfo < /proc/meminfo
      memfree=${meminfo#*MemFree:}
      memfree=${memfree%%$'n'*}
      memfree=${memfree#"${memfree%%[! ]*}"}


      Avoid bash though whose pattern matching is very ineficient. With zsh -o extendedglob, you can shorten it to:



      memfree=${${"$(</proc/meminfo)"##*MemFree: #}%%$'n'*}


      Note that ^ is special in many shells (Bourne, fish, rc, es and zsh with the extendedglob option at least), I'd recommend quoting it. Also note that echo can't be used to output arbitrary data (hence my use of printf above).







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Feb 21 at 20:40

























      answered Feb 20 at 11:59









      Stéphane ChazelasStéphane Chazelas

      307k57581939




      307k57581939








      • 4





        In the case with printf you say the shell needs to spawn two processes, but isn't printf a shell builtin?

        – David Conrad
        Feb 20 at 17:56






      • 6





        @DavidConrad It is, but most shells don't try to analyze the pipeline for which parts it could run in the current process. It just forks itself and lets the children figure it out. In this case, the parent process forks twice; the child for the left side then sees a built-in and executes it; the child for the right side sees grep and execs.

        – chepner
        Feb 20 at 18:47






      • 1





        @DavidConrad, the pipe is an IPC mechanism, so in any case the two sides will have to run in different processes. While in A | B, there are some shells like AT&T ksh or zsh that run B in the current shell process if it's a builtin or compound or function command, I don't know of any that runs A in the current process. If anything, to do that, they would have to handle SIGPIPE in a complex way as if A was running in child process and without terminating the shell for the behaviour not to be too surprising when B exits early. It's much easier to run B in the parent process.

        – Stéphane Chazelas
        Feb 21 at 6:15











      • Bash supports <<<

        – D. Ben Knoble
        Feb 21 at 17:48






      • 1





        @D.BenKnoble, I didn't mean to imply bash didn't support <<<, just that the operator came from zsh like $(<...) came from ksh.

        – Stéphane Chazelas
        Feb 21 at 20:36














      • 4





        In the case with printf you say the shell needs to spawn two processes, but isn't printf a shell builtin?

        – David Conrad
        Feb 20 at 17:56






      • 6





        @DavidConrad It is, but most shells don't try to analyze the pipeline for which parts it could run in the current process. It just forks itself and lets the children figure it out. In this case, the parent process forks twice; the child for the left side then sees a built-in and executes it; the child for the right side sees grep and execs.

        – chepner
        Feb 20 at 18:47






      • 1





        @DavidConrad, the pipe is an IPC mechanism, so in any case the two sides will have to run in different processes. While in A | B, there are some shells like AT&T ksh or zsh that run B in the current shell process if it's a builtin or compound or function command, I don't know of any that runs A in the current process. If anything, to do that, they would have to handle SIGPIPE in a complex way as if A was running in child process and without terminating the shell for the behaviour not to be too surprising when B exits early. It's much easier to run B in the parent process.

        – Stéphane Chazelas
        Feb 21 at 6:15











      • Bash supports <<<

        – D. Ben Knoble
        Feb 21 at 17:48






      • 1





        @D.BenKnoble, I didn't mean to imply bash didn't support <<<, just that the operator came from zsh like $(<...) came from ksh.

        – Stéphane Chazelas
        Feb 21 at 20:36








      4




      4





      In the case with printf you say the shell needs to spawn two processes, but isn't printf a shell builtin?

      – David Conrad
      Feb 20 at 17:56





      In the case with printf you say the shell needs to spawn two processes, but isn't printf a shell builtin?

      – David Conrad
      Feb 20 at 17:56




      6




      6





      @DavidConrad It is, but most shells don't try to analyze the pipeline for which parts it could run in the current process. It just forks itself and lets the children figure it out. In this case, the parent process forks twice; the child for the left side then sees a built-in and executes it; the child for the right side sees grep and execs.

      – chepner
      Feb 20 at 18:47





      @DavidConrad It is, but most shells don't try to analyze the pipeline for which parts it could run in the current process. It just forks itself and lets the children figure it out. In this case, the parent process forks twice; the child for the left side then sees a built-in and executes it; the child for the right side sees grep and execs.

      – chepner
      Feb 20 at 18:47




      1




      1





      @DavidConrad, the pipe is an IPC mechanism, so in any case the two sides will have to run in different processes. While in A | B, there are some shells like AT&T ksh or zsh that run B in the current shell process if it's a builtin or compound or function command, I don't know of any that runs A in the current process. If anything, to do that, they would have to handle SIGPIPE in a complex way as if A was running in child process and without terminating the shell for the behaviour not to be too surprising when B exits early. It's much easier to run B in the parent process.

      – Stéphane Chazelas
      Feb 21 at 6:15





      @DavidConrad, the pipe is an IPC mechanism, so in any case the two sides will have to run in different processes. While in A | B, there are some shells like AT&T ksh or zsh that run B in the current shell process if it's a builtin or compound or function command, I don't know of any that runs A in the current process. If anything, to do that, they would have to handle SIGPIPE in a complex way as if A was running in child process and without terminating the shell for the behaviour not to be too surprising when B exits early. It's much easier to run B in the parent process.

      – Stéphane Chazelas
      Feb 21 at 6:15













      Bash supports <<<

      – D. Ben Knoble
      Feb 21 at 17:48





      Bash supports <<<

      – D. Ben Knoble
      Feb 21 at 17:48




      1




      1





      @D.BenKnoble, I didn't mean to imply bash didn't support <<<, just that the operator came from zsh like $(<...) came from ksh.

      – Stéphane Chazelas
      Feb 21 at 20:36





      @D.BenKnoble, I didn't mean to imply bash didn't support <<<, just that the operator came from zsh like $(<...) came from ksh.

      – Stéphane Chazelas
      Feb 21 at 20:36













      6














      In your first case you are just using grep utility and finding something from file /proc/meminfo, /proc is a virtual file system so /proc/meminfo file is in the memory, and it requires very little time to fetch its content.



      But in the second case, you are creating a pipe, then passing the first command's output to the second command using this pipe, which is costly.



      The difference is because of /proc (because it is in memory) and pipe, see the example below:



      time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null

      real 0m0.914s
      user 0m0.032s
      sys 0m0.148s


      cat /proc/meminfo > file
      time for i in {1..1000};do grep ^MemFree file;done >/dev/null

      real 0m0.938s
      user 0m0.032s
      sys 0m0.152s


      time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null

      real 0m1.016s
      user 0m0.040s
      sys 0m0.232s





      share|improve this answer






























        6














        In your first case you are just using grep utility and finding something from file /proc/meminfo, /proc is a virtual file system so /proc/meminfo file is in the memory, and it requires very little time to fetch its content.



        But in the second case, you are creating a pipe, then passing the first command's output to the second command using this pipe, which is costly.



        The difference is because of /proc (because it is in memory) and pipe, see the example below:



        time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null

        real 0m0.914s
        user 0m0.032s
        sys 0m0.148s


        cat /proc/meminfo > file
        time for i in {1..1000};do grep ^MemFree file;done >/dev/null

        real 0m0.938s
        user 0m0.032s
        sys 0m0.152s


        time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null

        real 0m1.016s
        user 0m0.040s
        sys 0m0.232s





        share|improve this answer




























          6












          6








          6







          In your first case you are just using grep utility and finding something from file /proc/meminfo, /proc is a virtual file system so /proc/meminfo file is in the memory, and it requires very little time to fetch its content.



          But in the second case, you are creating a pipe, then passing the first command's output to the second command using this pipe, which is costly.



          The difference is because of /proc (because it is in memory) and pipe, see the example below:



          time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null

          real 0m0.914s
          user 0m0.032s
          sys 0m0.148s


          cat /proc/meminfo > file
          time for i in {1..1000};do grep ^MemFree file;done >/dev/null

          real 0m0.938s
          user 0m0.032s
          sys 0m0.152s


          time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null

          real 0m1.016s
          user 0m0.040s
          sys 0m0.232s





          share|improve this answer















          In your first case you are just using grep utility and finding something from file /proc/meminfo, /proc is a virtual file system so /proc/meminfo file is in the memory, and it requires very little time to fetch its content.



          But in the second case, you are creating a pipe, then passing the first command's output to the second command using this pipe, which is costly.



          The difference is because of /proc (because it is in memory) and pipe, see the example below:



          time for i in {1..1000};do grep ^MemFree /proc/meminfo;done >/dev/null

          real 0m0.914s
          user 0m0.032s
          sys 0m0.148s


          cat /proc/meminfo > file
          time for i in {1..1000};do grep ^MemFree file;done >/dev/null

          real 0m0.938s
          user 0m0.032s
          sys 0m0.152s


          time for i in {1..1000};do echo "$a"|grep ^MemFree; done >/dev/null

          real 0m1.016s
          user 0m0.040s
          sys 0m0.232s






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 20 at 14:14









          terdon

          131k32258436




          131k32258436










          answered Feb 20 at 12:03









          PRYPRY

          2,55831026




          2,55831026























              1














              You are calling an external command in both cases (grep).
              The external call require a subshell. Forking that shell is the fundamental cause for the delay. Both cases are similar, thus: a similar delay.



              If you want to read the external file only once and use it (from a variable) multiple times don't go out of the shell:



              meminfo=$(< /dev/meminfo)    
              time for i in {1..1000};do
              [[ $meminfo =~ MemFree: *([0-9]*) *.B ]]
              printf '%sn' "${BASH_REMATCH[1]}"
              done


              Which takes only about 0.1 seconds instead of the full 1 second for the grep call.






              share|improve this answer




























                1














                You are calling an external command in both cases (grep).
                The external call require a subshell. Forking that shell is the fundamental cause for the delay. Both cases are similar, thus: a similar delay.



                If you want to read the external file only once and use it (from a variable) multiple times don't go out of the shell:



                meminfo=$(< /dev/meminfo)    
                time for i in {1..1000};do
                [[ $meminfo =~ MemFree: *([0-9]*) *.B ]]
                printf '%sn' "${BASH_REMATCH[1]}"
                done


                Which takes only about 0.1 seconds instead of the full 1 second for the grep call.






                share|improve this answer


























                  1












                  1








                  1







                  You are calling an external command in both cases (grep).
                  The external call require a subshell. Forking that shell is the fundamental cause for the delay. Both cases are similar, thus: a similar delay.



                  If you want to read the external file only once and use it (from a variable) multiple times don't go out of the shell:



                  meminfo=$(< /dev/meminfo)    
                  time for i in {1..1000};do
                  [[ $meminfo =~ MemFree: *([0-9]*) *.B ]]
                  printf '%sn' "${BASH_REMATCH[1]}"
                  done


                  Which takes only about 0.1 seconds instead of the full 1 second for the grep call.






                  share|improve this answer













                  You are calling an external command in both cases (grep).
                  The external call require a subshell. Forking that shell is the fundamental cause for the delay. Both cases are similar, thus: a similar delay.



                  If you want to read the external file only once and use it (from a variable) multiple times don't go out of the shell:



                  meminfo=$(< /dev/meminfo)    
                  time for i in {1..1000};do
                  [[ $meminfo =~ MemFree: *([0-9]*) *.B ]]
                  printf '%sn' "${BASH_REMATCH[1]}"
                  done


                  Which takes only about 0.1 seconds instead of the full 1 second for the grep call.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 23 at 4:58









                  IsaacIsaac

                  12k11852




                  12k11852






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f501828%2fwhy-is-opening-a-file-faster-than-reading-variable-content%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                      ComboBox Display Member on multiple fields

                      Is it possible to collect Nectar points via Trainline?