Performance optimization searching data in file system












5














I have a network associated storage where around 5 million txt files are there related to around 3 million transactions. Size of the total data is around 3.5 TB. I have to search in that location to find if the transaction related file is available or not and have to make two separate reports as CSV file of "available files" and "not available files". We are
still in JAVA 6. The challenge that I am facing since I have to search in the location recursively, it takes me around average 2 mins to search in that location because of huge size. I am using Java I/O API to search recursively like below. Is there any way I can improve the performance?



File searchFile(File location, String fileName) {
if (location.isDirectory()) {
File arr = location.listFiles();
for (File f : arr) {
File found = searchFile(f, fileName);
if (found != null)
return found;
}
} else {
if (location.getName().equals(fileName)) {
return location;
}
}
return null;
}









share|improve this question
























  • When you are looping through such a big number, recursion is very bad... it adds loads of overhead for JVM... but again it depends on how deep is your directory structure as well...
    – Ketan
    Nov 19 '18 at 5:00


















5














I have a network associated storage where around 5 million txt files are there related to around 3 million transactions. Size of the total data is around 3.5 TB. I have to search in that location to find if the transaction related file is available or not and have to make two separate reports as CSV file of "available files" and "not available files". We are
still in JAVA 6. The challenge that I am facing since I have to search in the location recursively, it takes me around average 2 mins to search in that location because of huge size. I am using Java I/O API to search recursively like below. Is there any way I can improve the performance?



File searchFile(File location, String fileName) {
if (location.isDirectory()) {
File arr = location.listFiles();
for (File f : arr) {
File found = searchFile(f, fileName);
if (found != null)
return found;
}
} else {
if (location.getName().equals(fileName)) {
return location;
}
}
return null;
}









share|improve this question
























  • When you are looping through such a big number, recursion is very bad... it adds loads of overhead for JVM... but again it depends on how deep is your directory structure as well...
    – Ketan
    Nov 19 '18 at 5:00
















5












5








5







I have a network associated storage where around 5 million txt files are there related to around 3 million transactions. Size of the total data is around 3.5 TB. I have to search in that location to find if the transaction related file is available or not and have to make two separate reports as CSV file of "available files" and "not available files". We are
still in JAVA 6. The challenge that I am facing since I have to search in the location recursively, it takes me around average 2 mins to search in that location because of huge size. I am using Java I/O API to search recursively like below. Is there any way I can improve the performance?



File searchFile(File location, String fileName) {
if (location.isDirectory()) {
File arr = location.listFiles();
for (File f : arr) {
File found = searchFile(f, fileName);
if (found != null)
return found;
}
} else {
if (location.getName().equals(fileName)) {
return location;
}
}
return null;
}









share|improve this question















I have a network associated storage where around 5 million txt files are there related to around 3 million transactions. Size of the total data is around 3.5 TB. I have to search in that location to find if the transaction related file is available or not and have to make two separate reports as CSV file of "available files" and "not available files". We are
still in JAVA 6. The challenge that I am facing since I have to search in the location recursively, it takes me around average 2 mins to search in that location because of huge size. I am using Java I/O API to search recursively like below. Is there any way I can improve the performance?



File searchFile(File location, String fileName) {
if (location.isDirectory()) {
File arr = location.listFiles();
for (File f : arr) {
File found = searchFile(f, fileName);
if (found != null)
return found;
}
} else {
if (location.getName().equals(fileName)) {
return location;
}
}
return null;
}






java file search optimization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 3:55









ETO

2,096422




2,096422










asked Nov 18 '18 at 23:46









Samarjit BaruahSamarjit Baruah

285




285












  • When you are looping through such a big number, recursion is very bad... it adds loads of overhead for JVM... but again it depends on how deep is your directory structure as well...
    – Ketan
    Nov 19 '18 at 5:00




















  • When you are looping through such a big number, recursion is very bad... it adds loads of overhead for JVM... but again it depends on how deep is your directory structure as well...
    – Ketan
    Nov 19 '18 at 5:00


















When you are looping through such a big number, recursion is very bad... it adds loads of overhead for JVM... but again it depends on how deep is your directory structure as well...
– Ketan
Nov 19 '18 at 5:00






When you are looping through such a big number, recursion is very bad... it adds loads of overhead for JVM... but again it depends on how deep is your directory structure as well...
– Ketan
Nov 19 '18 at 5:00














4 Answers
4






active

oldest

votes


















1















  • Searching in a Directory or a Network Associated Storage is a
    nightmare.It takes lot of time when directory is too big / depth. As you are in Java 6 ,
    So you can follow an old fashion approach. List all files in a CSV file like
    below.

  • e.g


    find . -type f -name '*.txt' >> test.csv . (if unix)



    dir /b/s *.txt > test.csv (if Windows)




  • Now load this CSV file into a Map to have an index as filename. Loading the file will take some time as it will be huge but once you load then searching in the map ( as it will be file name ) will be much more quick and will reduce your search time drastically.






share|improve this answer

















  • 1




    Thanks for the excellent idea . I think running locally after getting the list of files as CSV using above command was really helpful .After that, I generated a HashMap from the CSV and did all my report processing. Now I am able to generate report in very less time.You saved my day. Cheers!!
    – Samarjit Baruah
    Nov 20 '18 at 15:10



















3














You should take a different approach, rather than walking the entire directory every time you search for a file, you should instead create an index, which is a mapping from filename to file location.



Essentially:



void buildIndex(Map index, File baseDir) {
if (location.isDirectory()) {
File arr = location.listFiles();
for (File f : arr) {
buildIndex(index, f);
}
} else {
index.put(f.getName(), f);
}
}


Now that you've got the index, searching for the files becomes trivial.



Now you've got the files in a Map, you can also even use Set operation to find the intersection:



Map index = new HashMap();
buildIndex(index, ...);
Set fileSet = index.keySet();
Set transactionSet = ...;
Set intersection = new HashSet(fileSet);
fileSet.retainAll(transactionSet);


Optionally, if the index itself is too big to keep in memory, you may want to create the index in an SQLite database.






share|improve this answer























  • you are right, and in this case index itself will grow too big... so better to divide the index on various criteria, the simplest being by first character of the file name.Long back I had asked a question which is about removing duplicates but the core logic suggested by various people might be applicable... stackoverflow.com/questions/12501112/…
    – Ketan
    Nov 19 '18 at 4:57










  • @Ketan: I'd suggest don't bother about trying to do tricks like splitting the index by first characters or things like that. Just let SQLite deal with managing the data. However, 5 million filenames is tiny amount of data. It's just at most around a gigabyte (likely much less), which is still fairly small in modern machines. You'll likely never need to use anything other than Map unless your data is a couple order of magnitude larger than this.
    – Lie Ryan
    Nov 19 '18 at 7:35












  • SQLite is not an option in my plate.I could able to reduce the time by creating index but since large volume data were stored in a shared drive so it is still taking time to load all files to create the index Map. Thanks a lot for the hints of indexing!
    – Samarjit Baruah
    Nov 19 '18 at 21:01



















0














You can use NIO FileVisitor, available in java 6.



Path findTransactionFile(Path root) {
Path transactionFile = null;
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
if (/* todo dir predicate*/ false) {
return FileVisitResult.SKIP_SUBTREE; // optimization
}
return FileVisitResult.CONTINUE;
}

@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
if (/* todo file predicate*/ true) {
transactionFile = file;
return FileVisitResult.TERMINATE; // found
}
return FileVisitResult.CONTINUE;
}
});

return transactionFile;
}





share|improve this answer





















  • I think I'm wrong and it's available since java 7, you can use Apache VFS for similar behaviour in older java
    – sturcotte06
    Nov 18 '18 at 23:58



















0














I dont know the answer, but from algorithm perspective, your program has the worst complexity. per single look up for single transaction , it iterates all the files (5 million). you have 3 million transactions.



my suggestion is to iterate the files (5 million files) and build up an index based on the file name. then iterate the transactions and search the index instead of full scan.
Or there might be third party free tools that can index a large file system and then that index can be accessed by an external application (in this case your java app). if you can not find that kind of tool, better you invent it (then you can build the index in a optimum way that suits your requirement).






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53366609%2fperformance-optimization-searching-data-in-file-system%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1















    • Searching in a Directory or a Network Associated Storage is a
      nightmare.It takes lot of time when directory is too big / depth. As you are in Java 6 ,
      So you can follow an old fashion approach. List all files in a CSV file like
      below.

    • e.g


      find . -type f -name '*.txt' >> test.csv . (if unix)



      dir /b/s *.txt > test.csv (if Windows)




    • Now load this CSV file into a Map to have an index as filename. Loading the file will take some time as it will be huge but once you load then searching in the map ( as it will be file name ) will be much more quick and will reduce your search time drastically.






    share|improve this answer

















    • 1




      Thanks for the excellent idea . I think running locally after getting the list of files as CSV using above command was really helpful .After that, I generated a HashMap from the CSV and did all my report processing. Now I am able to generate report in very less time.You saved my day. Cheers!!
      – Samarjit Baruah
      Nov 20 '18 at 15:10
















    1















    • Searching in a Directory or a Network Associated Storage is a
      nightmare.It takes lot of time when directory is too big / depth. As you are in Java 6 ,
      So you can follow an old fashion approach. List all files in a CSV file like
      below.

    • e.g


      find . -type f -name '*.txt' >> test.csv . (if unix)



      dir /b/s *.txt > test.csv (if Windows)




    • Now load this CSV file into a Map to have an index as filename. Loading the file will take some time as it will be huge but once you load then searching in the map ( as it will be file name ) will be much more quick and will reduce your search time drastically.






    share|improve this answer

















    • 1




      Thanks for the excellent idea . I think running locally after getting the list of files as CSV using above command was really helpful .After that, I generated a HashMap from the CSV and did all my report processing. Now I am able to generate report in very less time.You saved my day. Cheers!!
      – Samarjit Baruah
      Nov 20 '18 at 15:10














    1












    1








    1







    • Searching in a Directory or a Network Associated Storage is a
      nightmare.It takes lot of time when directory is too big / depth. As you are in Java 6 ,
      So you can follow an old fashion approach. List all files in a CSV file like
      below.

    • e.g


      find . -type f -name '*.txt' >> test.csv . (if unix)



      dir /b/s *.txt > test.csv (if Windows)




    • Now load this CSV file into a Map to have an index as filename. Loading the file will take some time as it will be huge but once you load then searching in the map ( as it will be file name ) will be much more quick and will reduce your search time drastically.






    share|improve this answer













    • Searching in a Directory or a Network Associated Storage is a
      nightmare.It takes lot of time when directory is too big / depth. As you are in Java 6 ,
      So you can follow an old fashion approach. List all files in a CSV file like
      below.

    • e.g


      find . -type f -name '*.txt' >> test.csv . (if unix)



      dir /b/s *.txt > test.csv (if Windows)




    • Now load this CSV file into a Map to have an index as filename. Loading the file will take some time as it will be huge but once you load then searching in the map ( as it will be file name ) will be much more quick and will reduce your search time drastically.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 19 '18 at 20:16









    utpal416utpal416

    526210




    526210








    • 1




      Thanks for the excellent idea . I think running locally after getting the list of files as CSV using above command was really helpful .After that, I generated a HashMap from the CSV and did all my report processing. Now I am able to generate report in very less time.You saved my day. Cheers!!
      – Samarjit Baruah
      Nov 20 '18 at 15:10














    • 1




      Thanks for the excellent idea . I think running locally after getting the list of files as CSV using above command was really helpful .After that, I generated a HashMap from the CSV and did all my report processing. Now I am able to generate report in very less time.You saved my day. Cheers!!
      – Samarjit Baruah
      Nov 20 '18 at 15:10








    1




    1




    Thanks for the excellent idea . I think running locally after getting the list of files as CSV using above command was really helpful .After that, I generated a HashMap from the CSV and did all my report processing. Now I am able to generate report in very less time.You saved my day. Cheers!!
    – Samarjit Baruah
    Nov 20 '18 at 15:10




    Thanks for the excellent idea . I think running locally after getting the list of files as CSV using above command was really helpful .After that, I generated a HashMap from the CSV and did all my report processing. Now I am able to generate report in very less time.You saved my day. Cheers!!
    – Samarjit Baruah
    Nov 20 '18 at 15:10













    3














    You should take a different approach, rather than walking the entire directory every time you search for a file, you should instead create an index, which is a mapping from filename to file location.



    Essentially:



    void buildIndex(Map index, File baseDir) {
    if (location.isDirectory()) {
    File arr = location.listFiles();
    for (File f : arr) {
    buildIndex(index, f);
    }
    } else {
    index.put(f.getName(), f);
    }
    }


    Now that you've got the index, searching for the files becomes trivial.



    Now you've got the files in a Map, you can also even use Set operation to find the intersection:



    Map index = new HashMap();
    buildIndex(index, ...);
    Set fileSet = index.keySet();
    Set transactionSet = ...;
    Set intersection = new HashSet(fileSet);
    fileSet.retainAll(transactionSet);


    Optionally, if the index itself is too big to keep in memory, you may want to create the index in an SQLite database.






    share|improve this answer























    • you are right, and in this case index itself will grow too big... so better to divide the index on various criteria, the simplest being by first character of the file name.Long back I had asked a question which is about removing duplicates but the core logic suggested by various people might be applicable... stackoverflow.com/questions/12501112/…
      – Ketan
      Nov 19 '18 at 4:57










    • @Ketan: I'd suggest don't bother about trying to do tricks like splitting the index by first characters or things like that. Just let SQLite deal with managing the data. However, 5 million filenames is tiny amount of data. It's just at most around a gigabyte (likely much less), which is still fairly small in modern machines. You'll likely never need to use anything other than Map unless your data is a couple order of magnitude larger than this.
      – Lie Ryan
      Nov 19 '18 at 7:35












    • SQLite is not an option in my plate.I could able to reduce the time by creating index but since large volume data were stored in a shared drive so it is still taking time to load all files to create the index Map. Thanks a lot for the hints of indexing!
      – Samarjit Baruah
      Nov 19 '18 at 21:01
















    3














    You should take a different approach, rather than walking the entire directory every time you search for a file, you should instead create an index, which is a mapping from filename to file location.



    Essentially:



    void buildIndex(Map index, File baseDir) {
    if (location.isDirectory()) {
    File arr = location.listFiles();
    for (File f : arr) {
    buildIndex(index, f);
    }
    } else {
    index.put(f.getName(), f);
    }
    }


    Now that you've got the index, searching for the files becomes trivial.



    Now you've got the files in a Map, you can also even use Set operation to find the intersection:



    Map index = new HashMap();
    buildIndex(index, ...);
    Set fileSet = index.keySet();
    Set transactionSet = ...;
    Set intersection = new HashSet(fileSet);
    fileSet.retainAll(transactionSet);


    Optionally, if the index itself is too big to keep in memory, you may want to create the index in an SQLite database.






    share|improve this answer























    • you are right, and in this case index itself will grow too big... so better to divide the index on various criteria, the simplest being by first character of the file name.Long back I had asked a question which is about removing duplicates but the core logic suggested by various people might be applicable... stackoverflow.com/questions/12501112/…
      – Ketan
      Nov 19 '18 at 4:57










    • @Ketan: I'd suggest don't bother about trying to do tricks like splitting the index by first characters or things like that. Just let SQLite deal with managing the data. However, 5 million filenames is tiny amount of data. It's just at most around a gigabyte (likely much less), which is still fairly small in modern machines. You'll likely never need to use anything other than Map unless your data is a couple order of magnitude larger than this.
      – Lie Ryan
      Nov 19 '18 at 7:35












    • SQLite is not an option in my plate.I could able to reduce the time by creating index but since large volume data were stored in a shared drive so it is still taking time to load all files to create the index Map. Thanks a lot for the hints of indexing!
      – Samarjit Baruah
      Nov 19 '18 at 21:01














    3












    3








    3






    You should take a different approach, rather than walking the entire directory every time you search for a file, you should instead create an index, which is a mapping from filename to file location.



    Essentially:



    void buildIndex(Map index, File baseDir) {
    if (location.isDirectory()) {
    File arr = location.listFiles();
    for (File f : arr) {
    buildIndex(index, f);
    }
    } else {
    index.put(f.getName(), f);
    }
    }


    Now that you've got the index, searching for the files becomes trivial.



    Now you've got the files in a Map, you can also even use Set operation to find the intersection:



    Map index = new HashMap();
    buildIndex(index, ...);
    Set fileSet = index.keySet();
    Set transactionSet = ...;
    Set intersection = new HashSet(fileSet);
    fileSet.retainAll(transactionSet);


    Optionally, if the index itself is too big to keep in memory, you may want to create the index in an SQLite database.






    share|improve this answer














    You should take a different approach, rather than walking the entire directory every time you search for a file, you should instead create an index, which is a mapping from filename to file location.



    Essentially:



    void buildIndex(Map index, File baseDir) {
    if (location.isDirectory()) {
    File arr = location.listFiles();
    for (File f : arr) {
    buildIndex(index, f);
    }
    } else {
    index.put(f.getName(), f);
    }
    }


    Now that you've got the index, searching for the files becomes trivial.



    Now you've got the files in a Map, you can also even use Set operation to find the intersection:



    Map index = new HashMap();
    buildIndex(index, ...);
    Set fileSet = index.keySet();
    Set transactionSet = ...;
    Set intersection = new HashSet(fileSet);
    fileSet.retainAll(transactionSet);


    Optionally, if the index itself is too big to keep in memory, you may want to create the index in an SQLite database.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 19 '18 at 10:41

























    answered Nov 19 '18 at 4:24









    Lie RyanLie Ryan

    44.6k968121




    44.6k968121












    • you are right, and in this case index itself will grow too big... so better to divide the index on various criteria, the simplest being by first character of the file name.Long back I had asked a question which is about removing duplicates but the core logic suggested by various people might be applicable... stackoverflow.com/questions/12501112/…
      – Ketan
      Nov 19 '18 at 4:57










    • @Ketan: I'd suggest don't bother about trying to do tricks like splitting the index by first characters or things like that. Just let SQLite deal with managing the data. However, 5 million filenames is tiny amount of data. It's just at most around a gigabyte (likely much less), which is still fairly small in modern machines. You'll likely never need to use anything other than Map unless your data is a couple order of magnitude larger than this.
      – Lie Ryan
      Nov 19 '18 at 7:35












    • SQLite is not an option in my plate.I could able to reduce the time by creating index but since large volume data were stored in a shared drive so it is still taking time to load all files to create the index Map. Thanks a lot for the hints of indexing!
      – Samarjit Baruah
      Nov 19 '18 at 21:01


















    • you are right, and in this case index itself will grow too big... so better to divide the index on various criteria, the simplest being by first character of the file name.Long back I had asked a question which is about removing duplicates but the core logic suggested by various people might be applicable... stackoverflow.com/questions/12501112/…
      – Ketan
      Nov 19 '18 at 4:57










    • @Ketan: I'd suggest don't bother about trying to do tricks like splitting the index by first characters or things like that. Just let SQLite deal with managing the data. However, 5 million filenames is tiny amount of data. It's just at most around a gigabyte (likely much less), which is still fairly small in modern machines. You'll likely never need to use anything other than Map unless your data is a couple order of magnitude larger than this.
      – Lie Ryan
      Nov 19 '18 at 7:35












    • SQLite is not an option in my plate.I could able to reduce the time by creating index but since large volume data were stored in a shared drive so it is still taking time to load all files to create the index Map. Thanks a lot for the hints of indexing!
      – Samarjit Baruah
      Nov 19 '18 at 21:01
















    you are right, and in this case index itself will grow too big... so better to divide the index on various criteria, the simplest being by first character of the file name.Long back I had asked a question which is about removing duplicates but the core logic suggested by various people might be applicable... stackoverflow.com/questions/12501112/…
    – Ketan
    Nov 19 '18 at 4:57




    you are right, and in this case index itself will grow too big... so better to divide the index on various criteria, the simplest being by first character of the file name.Long back I had asked a question which is about removing duplicates but the core logic suggested by various people might be applicable... stackoverflow.com/questions/12501112/…
    – Ketan
    Nov 19 '18 at 4:57












    @Ketan: I'd suggest don't bother about trying to do tricks like splitting the index by first characters or things like that. Just let SQLite deal with managing the data. However, 5 million filenames is tiny amount of data. It's just at most around a gigabyte (likely much less), which is still fairly small in modern machines. You'll likely never need to use anything other than Map unless your data is a couple order of magnitude larger than this.
    – Lie Ryan
    Nov 19 '18 at 7:35






    @Ketan: I'd suggest don't bother about trying to do tricks like splitting the index by first characters or things like that. Just let SQLite deal with managing the data. However, 5 million filenames is tiny amount of data. It's just at most around a gigabyte (likely much less), which is still fairly small in modern machines. You'll likely never need to use anything other than Map unless your data is a couple order of magnitude larger than this.
    – Lie Ryan
    Nov 19 '18 at 7:35














    SQLite is not an option in my plate.I could able to reduce the time by creating index but since large volume data were stored in a shared drive so it is still taking time to load all files to create the index Map. Thanks a lot for the hints of indexing!
    – Samarjit Baruah
    Nov 19 '18 at 21:01




    SQLite is not an option in my plate.I could able to reduce the time by creating index but since large volume data were stored in a shared drive so it is still taking time to load all files to create the index Map. Thanks a lot for the hints of indexing!
    – Samarjit Baruah
    Nov 19 '18 at 21:01











    0














    You can use NIO FileVisitor, available in java 6.



    Path findTransactionFile(Path root) {
    Path transactionFile = null;
    Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
    if (/* todo dir predicate*/ false) {
    return FileVisitResult.SKIP_SUBTREE; // optimization
    }
    return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
    if (/* todo file predicate*/ true) {
    transactionFile = file;
    return FileVisitResult.TERMINATE; // found
    }
    return FileVisitResult.CONTINUE;
    }
    });

    return transactionFile;
    }





    share|improve this answer





















    • I think I'm wrong and it's available since java 7, you can use Apache VFS for similar behaviour in older java
      – sturcotte06
      Nov 18 '18 at 23:58
















    0














    You can use NIO FileVisitor, available in java 6.



    Path findTransactionFile(Path root) {
    Path transactionFile = null;
    Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
    if (/* todo dir predicate*/ false) {
    return FileVisitResult.SKIP_SUBTREE; // optimization
    }
    return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
    if (/* todo file predicate*/ true) {
    transactionFile = file;
    return FileVisitResult.TERMINATE; // found
    }
    return FileVisitResult.CONTINUE;
    }
    });

    return transactionFile;
    }





    share|improve this answer





















    • I think I'm wrong and it's available since java 7, you can use Apache VFS for similar behaviour in older java
      – sturcotte06
      Nov 18 '18 at 23:58














    0












    0








    0






    You can use NIO FileVisitor, available in java 6.



    Path findTransactionFile(Path root) {
    Path transactionFile = null;
    Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
    if (/* todo dir predicate*/ false) {
    return FileVisitResult.SKIP_SUBTREE; // optimization
    }
    return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
    if (/* todo file predicate*/ true) {
    transactionFile = file;
    return FileVisitResult.TERMINATE; // found
    }
    return FileVisitResult.CONTINUE;
    }
    });

    return transactionFile;
    }





    share|improve this answer












    You can use NIO FileVisitor, available in java 6.



    Path findTransactionFile(Path root) {
    Path transactionFile = null;
    Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
    if (/* todo dir predicate*/ false) {
    return FileVisitResult.SKIP_SUBTREE; // optimization
    }
    return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
    if (/* todo file predicate*/ true) {
    transactionFile = file;
    return FileVisitResult.TERMINATE; // found
    }
    return FileVisitResult.CONTINUE;
    }
    });

    return transactionFile;
    }






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 18 '18 at 23:52









    sturcotte06sturcotte06

    1,5521617




    1,5521617












    • I think I'm wrong and it's available since java 7, you can use Apache VFS for similar behaviour in older java
      – sturcotte06
      Nov 18 '18 at 23:58


















    • I think I'm wrong and it's available since java 7, you can use Apache VFS for similar behaviour in older java
      – sturcotte06
      Nov 18 '18 at 23:58
















    I think I'm wrong and it's available since java 7, you can use Apache VFS for similar behaviour in older java
    – sturcotte06
    Nov 18 '18 at 23:58




    I think I'm wrong and it's available since java 7, you can use Apache VFS for similar behaviour in older java
    – sturcotte06
    Nov 18 '18 at 23:58











    0














    I dont know the answer, but from algorithm perspective, your program has the worst complexity. per single look up for single transaction , it iterates all the files (5 million). you have 3 million transactions.



    my suggestion is to iterate the files (5 million files) and build up an index based on the file name. then iterate the transactions and search the index instead of full scan.
    Or there might be third party free tools that can index a large file system and then that index can be accessed by an external application (in this case your java app). if you can not find that kind of tool, better you invent it (then you can build the index in a optimum way that suits your requirement).






    share|improve this answer




























      0














      I dont know the answer, but from algorithm perspective, your program has the worst complexity. per single look up for single transaction , it iterates all the files (5 million). you have 3 million transactions.



      my suggestion is to iterate the files (5 million files) and build up an index based on the file name. then iterate the transactions and search the index instead of full scan.
      Or there might be third party free tools that can index a large file system and then that index can be accessed by an external application (in this case your java app). if you can not find that kind of tool, better you invent it (then you can build the index in a optimum way that suits your requirement).






      share|improve this answer


























        0












        0








        0






        I dont know the answer, but from algorithm perspective, your program has the worst complexity. per single look up for single transaction , it iterates all the files (5 million). you have 3 million transactions.



        my suggestion is to iterate the files (5 million files) and build up an index based on the file name. then iterate the transactions and search the index instead of full scan.
        Or there might be third party free tools that can index a large file system and then that index can be accessed by an external application (in this case your java app). if you can not find that kind of tool, better you invent it (then you can build the index in a optimum way that suits your requirement).






        share|improve this answer














        I dont know the answer, but from algorithm perspective, your program has the worst complexity. per single look up for single transaction , it iterates all the files (5 million). you have 3 million transactions.



        my suggestion is to iterate the files (5 million files) and build up an index based on the file name. then iterate the transactions and search the index instead of full scan.
        Or there might be third party free tools that can index a large file system and then that index can be accessed by an external application (in this case your java app). if you can not find that kind of tool, better you invent it (then you can build the index in a optimum way that suits your requirement).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 '18 at 4:22

























        answered Nov 19 '18 at 4:13









        hunterhunter

        1,584715




        1,584715






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53366609%2fperformance-optimization-searching-data-in-file-system%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to send String Array data to Server using php in android

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

            Is anime1.com a legal site for watching anime?