First non-repeating letter












10














The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.



My Solution:



def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''


How can I improve the readability of my solution? In particular, I don't like:





  • counts[letter][0] because the 0 index is a bit ambiguous.

  • calculating the incremented value as another line, incr_val; it'd be nice to do the incrementing and updating of the dict in one line.


Anything else you could make more readable? I should add that I'm aware of collections.​Counter; for this solution I'd like to avoid using it.










share|improve this question




















  • 5




    I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 13 '18 at 16:11






  • 1




    Can you explain why you want to avoid collections.Counter?
    – Gareth Rees
    Dec 13 '18 at 19:50










  • There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
    – Kevin S
    Dec 13 '18 at 21:14










  • it might be easier to iterate over the string backwards
    – sudo rm -rf slash
    Dec 14 '18 at 7:04
















10














The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.



My Solution:



def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''


How can I improve the readability of my solution? In particular, I don't like:





  • counts[letter][0] because the 0 index is a bit ambiguous.

  • calculating the incremented value as another line, incr_val; it'd be nice to do the incrementing and updating of the dict in one line.


Anything else you could make more readable? I should add that I'm aware of collections.​Counter; for this solution I'd like to avoid using it.










share|improve this question




















  • 5




    I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 13 '18 at 16:11






  • 1




    Can you explain why you want to avoid collections.Counter?
    – Gareth Rees
    Dec 13 '18 at 19:50










  • There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
    – Kevin S
    Dec 13 '18 at 21:14










  • it might be easier to iterate over the string backwards
    – sudo rm -rf slash
    Dec 14 '18 at 7:04














10












10








10


4





The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.



My Solution:



def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''


How can I improve the readability of my solution? In particular, I don't like:





  • counts[letter][0] because the 0 index is a bit ambiguous.

  • calculating the incremented value as another line, incr_val; it'd be nice to do the incrementing and updating of the dict in one line.


Anything else you could make more readable? I should add that I'm aware of collections.​Counter; for this solution I'd like to avoid using it.










share|improve this question















The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.



My Solution:



def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''


How can I improve the readability of my solution? In particular, I don't like:





  • counts[letter][0] because the 0 index is a bit ambiguous.

  • calculating the incremented value as another line, incr_val; it'd be nice to do the incrementing and updating of the dict in one line.


Anything else you could make more readable? I should add that I'm aware of collections.​Counter; for this solution I'd like to avoid using it.







python python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 13 '18 at 16:49









Toby Speight

23.4k639111




23.4k639111










asked Dec 13 '18 at 14:56









Kevin S

20317




20317








  • 5




    I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 13 '18 at 16:11






  • 1




    Can you explain why you want to avoid collections.Counter?
    – Gareth Rees
    Dec 13 '18 at 19:50










  • There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
    – Kevin S
    Dec 13 '18 at 21:14










  • it might be easier to iterate over the string backwards
    – sudo rm -rf slash
    Dec 14 '18 at 7:04














  • 5




    I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 13 '18 at 16:11






  • 1




    Can you explain why you want to avoid collections.Counter?
    – Gareth Rees
    Dec 13 '18 at 19:50










  • There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
    – Kevin S
    Dec 13 '18 at 21:14










  • it might be easier to iterate over the string backwards
    – sudo rm -rf slash
    Dec 14 '18 at 7:04








5




5




I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11




I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11




1




1




Can you explain why you want to avoid collections.Counter?
– Gareth Rees
Dec 13 '18 at 19:50




Can you explain why you want to avoid collections.Counter?
– Gareth Rees
Dec 13 '18 at 19:50












There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14




There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14












it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04




it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04










4 Answers
4






active

oldest

votes


















8














IMHO looping twice over the input string is not acceptable for multiple reasons




  • the string is of unknown size

  • you invalidate your code for generators


While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).



import sys
assert sys.version_info >= (3, 6)

def non_repeat(s):
repeated = set()
candidates = dict()
for original_case in s:
lower_case = original_case.lower()
if lower_case not in repeated:
if lower_case not in candidates:
candidates[lower_case] = original_case
else:
repeated.add(lower_case)
del candidates[lower_case]

if candidates:
return next(iter(candidates.values()))
else:
return ''


This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.





Edit: generator example



Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.



def char_gen(f):
for line in f:
for c in line.strip():
yield c

with open('bigfile.txt') as f:
print(non_repeat(char_gen(f)))


also you might use the algorithm on a generator expression



print(non_repeat(c for c in "aabbcd" if c != 'c'))





share|improve this answer























  • You could also use collections.OrderedDict to make the dependence on the dictionary being ordered more explicit.
    – JAD
    Dec 14 '18 at 8:04










  • This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
    – Kevin S
    Dec 14 '18 at 16:21










  • Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
    – phoog
    Dec 14 '18 at 21:57










  • @phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
    – stefan
    Dec 15 '18 at 19:41










  • @stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
    – phoog
    Dec 15 '18 at 20:52



















6














I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict for counting and simplify the code and contribute to readability:



from collections import defaultdict


def non_repeat(input_string):
counts = defaultdict(int)
positions = {}

for position, letter in enumerate(input_string.lower()):
counts[letter] += 1
positions[letter] = position

for letter in input_string.lower():
if counts[letter] == 1:
return input_string[positions[letter]]

return ''





share|improve this answer























  • I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
    – Kevin S
    Dec 13 '18 at 15:44






  • 2




    We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
    – Kevin S
    Dec 13 '18 at 16:09






  • 1




    @KevinS yeah, this means that we don't really need the if letter not in initial_positions check.
    – alecxe
    Dec 13 '18 at 16:13






  • 1




    Excuse me if I'm wrong, but wouldn't input_string[positions[letter]] just be the same as letter?
    – JAD
    Dec 14 '18 at 7:58








  • 1




    @JAD If lower () wasn't called on the input string, it would. But in any case the positions dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate to retrieve the position.
    – Mathias Ettinger
    Dec 14 '18 at 8:29



















5














Going further from alecxe's answer:




  • you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.

  • you can ensure lower is called only once


You'd get something like:



from collections import Counter

def non_repeat(input_string):
lower = input_string.lower()
count = Counter(lower)
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''


or, for a Counter-less solution:



def non_repeat(input_string):
lower = input_string.lower()
count = defaultdict(int)
for c in lower:
count[c] += 1
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''


Also, here's a quick test suite I wrote:



tests = [
('', ''),
('AA', ''),
('AAABBB', ''),
('AAABBBc', 'c'),
('azerty', 'a'),
('aazzeerty', 'r'),
('azerAZERty', 't'),
]

for inp, out in tests:
assert non_repeat(inp) == out





share|improve this answer



















  • 1




    The question did specifically state that Kevin is avoiding collections.​Counter, so that's not such a helpful suggestion. The rest makes sense.
    – Toby Speight
    Dec 13 '18 at 16:52






  • 1




    @TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
    – Josay
    Dec 13 '18 at 17:04










  • Is Counter still slower than defaultdict?
    – jpmc26
    Dec 13 '18 at 22:47










  • I didn't know it had been the case. I'll try a benchmark if I think about it
    – Josay
    Dec 13 '18 at 22:53



















2














A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s)) pass + constant time:



def non_repeat(s):
LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
seen_order = # Store the order for figuring out which came first
seen_set = set() # Store whether we've seen the character
dupe_set = set() # Store whether we've seen it more than once

# Scan the string
for ch in s:
chl = ch.lower() # lowered character for seen/dupe sets
if chl not in seen_set:
seen_order.append(ch) # This uses the non-lowered version to preserve case
seen_set.add(chl)
else:
dupe_set.add(chl)
if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
return '' # If dupe set contains all possible characters, exit early

# Find uniques
unique_set = seen_set - dupe_set

# Find the first one, if any
if unique_set:
for ch in seen_order:
if ch.lower() in unique_set:
return ch
return ''


Some notes on speed:




  • O(len(s)) average case, O(1) best case (see early exit) - to build the list/sets - set membership, additions and list appends are all average O(1) operations, worst case O(len(set/list))*


  • O(1) - Set difference on average, worst case O(len(set))*


  • O(len(list))* for the final check



*O(len(list)) and O(len(set)) both have upper bounds of LEN_CHAR_SET_LOWERED, which means they end up constant time, O(1), as the string grows



This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.






share|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209622%2ffirst-non-repeating-letter%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    8














    IMHO looping twice over the input string is not acceptable for multiple reasons




    • the string is of unknown size

    • you invalidate your code for generators


    While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).



    import sys
    assert sys.version_info >= (3, 6)

    def non_repeat(s):
    repeated = set()
    candidates = dict()
    for original_case in s:
    lower_case = original_case.lower()
    if lower_case not in repeated:
    if lower_case not in candidates:
    candidates[lower_case] = original_case
    else:
    repeated.add(lower_case)
    del candidates[lower_case]

    if candidates:
    return next(iter(candidates.values()))
    else:
    return ''


    This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.





    Edit: generator example



    Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.



    def char_gen(f):
    for line in f:
    for c in line.strip():
    yield c

    with open('bigfile.txt') as f:
    print(non_repeat(char_gen(f)))


    also you might use the algorithm on a generator expression



    print(non_repeat(c for c in "aabbcd" if c != 'c'))





    share|improve this answer























    • You could also use collections.OrderedDict to make the dependence on the dictionary being ordered more explicit.
      – JAD
      Dec 14 '18 at 8:04










    • This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
      – Kevin S
      Dec 14 '18 at 16:21










    • Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
      – phoog
      Dec 14 '18 at 21:57










    • @phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
      – stefan
      Dec 15 '18 at 19:41










    • @stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
      – phoog
      Dec 15 '18 at 20:52
















    8














    IMHO looping twice over the input string is not acceptable for multiple reasons




    • the string is of unknown size

    • you invalidate your code for generators


    While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).



    import sys
    assert sys.version_info >= (3, 6)

    def non_repeat(s):
    repeated = set()
    candidates = dict()
    for original_case in s:
    lower_case = original_case.lower()
    if lower_case not in repeated:
    if lower_case not in candidates:
    candidates[lower_case] = original_case
    else:
    repeated.add(lower_case)
    del candidates[lower_case]

    if candidates:
    return next(iter(candidates.values()))
    else:
    return ''


    This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.





    Edit: generator example



    Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.



    def char_gen(f):
    for line in f:
    for c in line.strip():
    yield c

    with open('bigfile.txt') as f:
    print(non_repeat(char_gen(f)))


    also you might use the algorithm on a generator expression



    print(non_repeat(c for c in "aabbcd" if c != 'c'))





    share|improve this answer























    • You could also use collections.OrderedDict to make the dependence on the dictionary being ordered more explicit.
      – JAD
      Dec 14 '18 at 8:04










    • This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
      – Kevin S
      Dec 14 '18 at 16:21










    • Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
      – phoog
      Dec 14 '18 at 21:57










    • @phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
      – stefan
      Dec 15 '18 at 19:41










    • @stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
      – phoog
      Dec 15 '18 at 20:52














    8












    8








    8






    IMHO looping twice over the input string is not acceptable for multiple reasons




    • the string is of unknown size

    • you invalidate your code for generators


    While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).



    import sys
    assert sys.version_info >= (3, 6)

    def non_repeat(s):
    repeated = set()
    candidates = dict()
    for original_case in s:
    lower_case = original_case.lower()
    if lower_case not in repeated:
    if lower_case not in candidates:
    candidates[lower_case] = original_case
    else:
    repeated.add(lower_case)
    del candidates[lower_case]

    if candidates:
    return next(iter(candidates.values()))
    else:
    return ''


    This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.





    Edit: generator example



    Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.



    def char_gen(f):
    for line in f:
    for c in line.strip():
    yield c

    with open('bigfile.txt') as f:
    print(non_repeat(char_gen(f)))


    also you might use the algorithm on a generator expression



    print(non_repeat(c for c in "aabbcd" if c != 'c'))





    share|improve this answer














    IMHO looping twice over the input string is not acceptable for multiple reasons




    • the string is of unknown size

    • you invalidate your code for generators


    While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).



    import sys
    assert sys.version_info >= (3, 6)

    def non_repeat(s):
    repeated = set()
    candidates = dict()
    for original_case in s:
    lower_case = original_case.lower()
    if lower_case not in repeated:
    if lower_case not in candidates:
    candidates[lower_case] = original_case
    else:
    repeated.add(lower_case)
    del candidates[lower_case]

    if candidates:
    return next(iter(candidates.values()))
    else:
    return ''


    This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.





    Edit: generator example



    Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.



    def char_gen(f):
    for line in f:
    for c in line.strip():
    yield c

    with open('bigfile.txt') as f:
    print(non_repeat(char_gen(f)))


    also you might use the algorithm on a generator expression



    print(non_repeat(c for c in "aabbcd" if c != 'c'))






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 14 '18 at 18:29

























    answered Dec 13 '18 at 19:12









    stefan

    1,531211




    1,531211












    • You could also use collections.OrderedDict to make the dependence on the dictionary being ordered more explicit.
      – JAD
      Dec 14 '18 at 8:04










    • This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
      – Kevin S
      Dec 14 '18 at 16:21










    • Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
      – phoog
      Dec 14 '18 at 21:57










    • @phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
      – stefan
      Dec 15 '18 at 19:41










    • @stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
      – phoog
      Dec 15 '18 at 20:52


















    • You could also use collections.OrderedDict to make the dependence on the dictionary being ordered more explicit.
      – JAD
      Dec 14 '18 at 8:04










    • This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
      – Kevin S
      Dec 14 '18 at 16:21










    • Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
      – phoog
      Dec 14 '18 at 21:57










    • @phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
      – stefan
      Dec 15 '18 at 19:41










    • @stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
      – phoog
      Dec 15 '18 at 20:52
















    You could also use collections.OrderedDict to make the dependence on the dictionary being ordered more explicit.
    – JAD
    Dec 14 '18 at 8:04




    You could also use collections.OrderedDict to make the dependence on the dictionary being ordered more explicit.
    – JAD
    Dec 14 '18 at 8:04












    This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
    – Kevin S
    Dec 14 '18 at 16:21




    This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
    – Kevin S
    Dec 14 '18 at 16:21












    Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
    – phoog
    Dec 14 '18 at 21:57




    Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
    – phoog
    Dec 14 '18 at 21:57












    @phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
    – stefan
    Dec 15 '18 at 19:41




    @phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
    – stefan
    Dec 15 '18 at 19:41












    @stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
    – phoog
    Dec 15 '18 at 20:52




    @stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
    – phoog
    Dec 15 '18 at 20:52













    6














    I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict for counting and simplify the code and contribute to readability:



    from collections import defaultdict


    def non_repeat(input_string):
    counts = defaultdict(int)
    positions = {}

    for position, letter in enumerate(input_string.lower()):
    counts[letter] += 1
    positions[letter] = position

    for letter in input_string.lower():
    if counts[letter] == 1:
    return input_string[positions[letter]]

    return ''





    share|improve this answer























    • I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
      – Kevin S
      Dec 13 '18 at 15:44






    • 2




      We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
      – Kevin S
      Dec 13 '18 at 16:09






    • 1




      @KevinS yeah, this means that we don't really need the if letter not in initial_positions check.
      – alecxe
      Dec 13 '18 at 16:13






    • 1




      Excuse me if I'm wrong, but wouldn't input_string[positions[letter]] just be the same as letter?
      – JAD
      Dec 14 '18 at 7:58








    • 1




      @JAD If lower () wasn't called on the input string, it would. But in any case the positions dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate to retrieve the position.
      – Mathias Ettinger
      Dec 14 '18 at 8:29
















    6














    I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict for counting and simplify the code and contribute to readability:



    from collections import defaultdict


    def non_repeat(input_string):
    counts = defaultdict(int)
    positions = {}

    for position, letter in enumerate(input_string.lower()):
    counts[letter] += 1
    positions[letter] = position

    for letter in input_string.lower():
    if counts[letter] == 1:
    return input_string[positions[letter]]

    return ''





    share|improve this answer























    • I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
      – Kevin S
      Dec 13 '18 at 15:44






    • 2




      We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
      – Kevin S
      Dec 13 '18 at 16:09






    • 1




      @KevinS yeah, this means that we don't really need the if letter not in initial_positions check.
      – alecxe
      Dec 13 '18 at 16:13






    • 1




      Excuse me if I'm wrong, but wouldn't input_string[positions[letter]] just be the same as letter?
      – JAD
      Dec 14 '18 at 7:58








    • 1




      @JAD If lower () wasn't called on the input string, it would. But in any case the positions dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate to retrieve the position.
      – Mathias Ettinger
      Dec 14 '18 at 8:29














    6












    6








    6






    I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict for counting and simplify the code and contribute to readability:



    from collections import defaultdict


    def non_repeat(input_string):
    counts = defaultdict(int)
    positions = {}

    for position, letter in enumerate(input_string.lower()):
    counts[letter] += 1
    positions[letter] = position

    for letter in input_string.lower():
    if counts[letter] == 1:
    return input_string[positions[letter]]

    return ''





    share|improve this answer














    I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict for counting and simplify the code and contribute to readability:



    from collections import defaultdict


    def non_repeat(input_string):
    counts = defaultdict(int)
    positions = {}

    for position, letter in enumerate(input_string.lower()):
    counts[letter] += 1
    positions[letter] = position

    for letter in input_string.lower():
    if counts[letter] == 1:
    return input_string[positions[letter]]

    return ''






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 13 '18 at 16:14

























    answered Dec 13 '18 at 15:31









    alecxe

    14.9k53478




    14.9k53478












    • I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
      – Kevin S
      Dec 13 '18 at 15:44






    • 2




      We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
      – Kevin S
      Dec 13 '18 at 16:09






    • 1




      @KevinS yeah, this means that we don't really need the if letter not in initial_positions check.
      – alecxe
      Dec 13 '18 at 16:13






    • 1




      Excuse me if I'm wrong, but wouldn't input_string[positions[letter]] just be the same as letter?
      – JAD
      Dec 14 '18 at 7:58








    • 1




      @JAD If lower () wasn't called on the input string, it would. But in any case the positions dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate to retrieve the position.
      – Mathias Ettinger
      Dec 14 '18 at 8:29


















    • I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
      – Kevin S
      Dec 13 '18 at 15:44






    • 2




      We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
      – Kevin S
      Dec 13 '18 at 16:09






    • 1




      @KevinS yeah, this means that we don't really need the if letter not in initial_positions check.
      – alecxe
      Dec 13 '18 at 16:13






    • 1




      Excuse me if I'm wrong, but wouldn't input_string[positions[letter]] just be the same as letter?
      – JAD
      Dec 14 '18 at 7:58








    • 1




      @JAD If lower () wasn't called on the input string, it would. But in any case the positions dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate to retrieve the position.
      – Mathias Ettinger
      Dec 14 '18 at 8:29
















    I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
    – Kevin S
    Dec 13 '18 at 15:44




    I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
    – Kevin S
    Dec 13 '18 at 15:44




    2




    2




    We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
    – Kevin S
    Dec 13 '18 at 16:09




    We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
    – Kevin S
    Dec 13 '18 at 16:09




    1




    1




    @KevinS yeah, this means that we don't really need the if letter not in initial_positions check.
    – alecxe
    Dec 13 '18 at 16:13




    @KevinS yeah, this means that we don't really need the if letter not in initial_positions check.
    – alecxe
    Dec 13 '18 at 16:13




    1




    1




    Excuse me if I'm wrong, but wouldn't input_string[positions[letter]] just be the same as letter?
    – JAD
    Dec 14 '18 at 7:58






    Excuse me if I'm wrong, but wouldn't input_string[positions[letter]] just be the same as letter?
    – JAD
    Dec 14 '18 at 7:58






    1




    1




    @JAD If lower () wasn't called on the input string, it would. But in any case the positions dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate to retrieve the position.
    – Mathias Ettinger
    Dec 14 '18 at 8:29




    @JAD If lower () wasn't called on the input string, it would. But in any case the positions dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate to retrieve the position.
    – Mathias Ettinger
    Dec 14 '18 at 8:29











    5














    Going further from alecxe's answer:




    • you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.

    • you can ensure lower is called only once


    You'd get something like:



    from collections import Counter

    def non_repeat(input_string):
    lower = input_string.lower()
    count = Counter(lower)
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    or, for a Counter-less solution:



    def non_repeat(input_string):
    lower = input_string.lower()
    count = defaultdict(int)
    for c in lower:
    count[c] += 1
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    Also, here's a quick test suite I wrote:



    tests = [
    ('', ''),
    ('AA', ''),
    ('AAABBB', ''),
    ('AAABBBc', 'c'),
    ('azerty', 'a'),
    ('aazzeerty', 'r'),
    ('azerAZERty', 't'),
    ]

    for inp, out in tests:
    assert non_repeat(inp) == out





    share|improve this answer



















    • 1




      The question did specifically state that Kevin is avoiding collections.​Counter, so that's not such a helpful suggestion. The rest makes sense.
      – Toby Speight
      Dec 13 '18 at 16:52






    • 1




      @TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
      – Josay
      Dec 13 '18 at 17:04










    • Is Counter still slower than defaultdict?
      – jpmc26
      Dec 13 '18 at 22:47










    • I didn't know it had been the case. I'll try a benchmark if I think about it
      – Josay
      Dec 13 '18 at 22:53
















    5














    Going further from alecxe's answer:




    • you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.

    • you can ensure lower is called only once


    You'd get something like:



    from collections import Counter

    def non_repeat(input_string):
    lower = input_string.lower()
    count = Counter(lower)
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    or, for a Counter-less solution:



    def non_repeat(input_string):
    lower = input_string.lower()
    count = defaultdict(int)
    for c in lower:
    count[c] += 1
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    Also, here's a quick test suite I wrote:



    tests = [
    ('', ''),
    ('AA', ''),
    ('AAABBB', ''),
    ('AAABBBc', 'c'),
    ('azerty', 'a'),
    ('aazzeerty', 'r'),
    ('azerAZERty', 't'),
    ]

    for inp, out in tests:
    assert non_repeat(inp) == out





    share|improve this answer



















    • 1




      The question did specifically state that Kevin is avoiding collections.​Counter, so that's not such a helpful suggestion. The rest makes sense.
      – Toby Speight
      Dec 13 '18 at 16:52






    • 1




      @TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
      – Josay
      Dec 13 '18 at 17:04










    • Is Counter still slower than defaultdict?
      – jpmc26
      Dec 13 '18 at 22:47










    • I didn't know it had been the case. I'll try a benchmark if I think about it
      – Josay
      Dec 13 '18 at 22:53














    5












    5








    5






    Going further from alecxe's answer:




    • you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.

    • you can ensure lower is called only once


    You'd get something like:



    from collections import Counter

    def non_repeat(input_string):
    lower = input_string.lower()
    count = Counter(lower)
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    or, for a Counter-less solution:



    def non_repeat(input_string):
    lower = input_string.lower()
    count = defaultdict(int)
    for c in lower:
    count[c] += 1
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    Also, here's a quick test suite I wrote:



    tests = [
    ('', ''),
    ('AA', ''),
    ('AAABBB', ''),
    ('AAABBBc', 'c'),
    ('azerty', 'a'),
    ('aazzeerty', 'r'),
    ('azerAZERty', 't'),
    ]

    for inp, out in tests:
    assert non_repeat(inp) == out





    share|improve this answer














    Going further from alecxe's answer:




    • you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.

    • you can ensure lower is called only once


    You'd get something like:



    from collections import Counter

    def non_repeat(input_string):
    lower = input_string.lower()
    count = Counter(lower)
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    or, for a Counter-less solution:



    def non_repeat(input_string):
    lower = input_string.lower()
    count = defaultdict(int)
    for c in lower:
    count[c] += 1
    for c, l in zip(input_string, lower):
    if count[l] == 1:
    return c
    return ''


    Also, here's a quick test suite I wrote:



    tests = [
    ('', ''),
    ('AA', ''),
    ('AAABBB', ''),
    ('AAABBBc', 'c'),
    ('azerty', 'a'),
    ('aazzeerty', 'r'),
    ('azerAZERty', 't'),
    ]

    for inp, out in tests:
    assert non_repeat(inp) == out






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 13 '18 at 17:04

























    answered Dec 13 '18 at 16:47









    Josay

    25.5k13987




    25.5k13987








    • 1




      The question did specifically state that Kevin is avoiding collections.​Counter, so that's not such a helpful suggestion. The rest makes sense.
      – Toby Speight
      Dec 13 '18 at 16:52






    • 1




      @TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
      – Josay
      Dec 13 '18 at 17:04










    • Is Counter still slower than defaultdict?
      – jpmc26
      Dec 13 '18 at 22:47










    • I didn't know it had been the case. I'll try a benchmark if I think about it
      – Josay
      Dec 13 '18 at 22:53














    • 1




      The question did specifically state that Kevin is avoiding collections.​Counter, so that's not such a helpful suggestion. The rest makes sense.
      – Toby Speight
      Dec 13 '18 at 16:52






    • 1




      @TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
      – Josay
      Dec 13 '18 at 17:04










    • Is Counter still slower than defaultdict?
      – jpmc26
      Dec 13 '18 at 22:47










    • I didn't know it had been the case. I'll try a benchmark if I think about it
      – Josay
      Dec 13 '18 at 22:53








    1




    1




    The question did specifically state that Kevin is avoiding collections.​Counter, so that's not such a helpful suggestion. The rest makes sense.
    – Toby Speight
    Dec 13 '18 at 16:52




    The question did specifically state that Kevin is avoiding collections.​Counter, so that's not such a helpful suggestion. The rest makes sense.
    – Toby Speight
    Dec 13 '18 at 16:52




    1




    1




    @TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
    – Josay
    Dec 13 '18 at 17:04




    @TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
    – Josay
    Dec 13 '18 at 17:04












    Is Counter still slower than defaultdict?
    – jpmc26
    Dec 13 '18 at 22:47




    Is Counter still slower than defaultdict?
    – jpmc26
    Dec 13 '18 at 22:47












    I didn't know it had been the case. I'll try a benchmark if I think about it
    – Josay
    Dec 13 '18 at 22:53




    I didn't know it had been the case. I'll try a benchmark if I think about it
    – Josay
    Dec 13 '18 at 22:53











    2














    A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s)) pass + constant time:



    def non_repeat(s):
    LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
    seen_order = # Store the order for figuring out which came first
    seen_set = set() # Store whether we've seen the character
    dupe_set = set() # Store whether we've seen it more than once

    # Scan the string
    for ch in s:
    chl = ch.lower() # lowered character for seen/dupe sets
    if chl not in seen_set:
    seen_order.append(ch) # This uses the non-lowered version to preserve case
    seen_set.add(chl)
    else:
    dupe_set.add(chl)
    if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
    return '' # If dupe set contains all possible characters, exit early

    # Find uniques
    unique_set = seen_set - dupe_set

    # Find the first one, if any
    if unique_set:
    for ch in seen_order:
    if ch.lower() in unique_set:
    return ch
    return ''


    Some notes on speed:




    • O(len(s)) average case, O(1) best case (see early exit) - to build the list/sets - set membership, additions and list appends are all average O(1) operations, worst case O(len(set/list))*


    • O(1) - Set difference on average, worst case O(len(set))*


    • O(len(list))* for the final check



    *O(len(list)) and O(len(set)) both have upper bounds of LEN_CHAR_SET_LOWERED, which means they end up constant time, O(1), as the string grows



    This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.






    share|improve this answer


























      2














      A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s)) pass + constant time:



      def non_repeat(s):
      LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
      seen_order = # Store the order for figuring out which came first
      seen_set = set() # Store whether we've seen the character
      dupe_set = set() # Store whether we've seen it more than once

      # Scan the string
      for ch in s:
      chl = ch.lower() # lowered character for seen/dupe sets
      if chl not in seen_set:
      seen_order.append(ch) # This uses the non-lowered version to preserve case
      seen_set.add(chl)
      else:
      dupe_set.add(chl)
      if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
      return '' # If dupe set contains all possible characters, exit early

      # Find uniques
      unique_set = seen_set - dupe_set

      # Find the first one, if any
      if unique_set:
      for ch in seen_order:
      if ch.lower() in unique_set:
      return ch
      return ''


      Some notes on speed:




      • O(len(s)) average case, O(1) best case (see early exit) - to build the list/sets - set membership, additions and list appends are all average O(1) operations, worst case O(len(set/list))*


      • O(1) - Set difference on average, worst case O(len(set))*


      • O(len(list))* for the final check



      *O(len(list)) and O(len(set)) both have upper bounds of LEN_CHAR_SET_LOWERED, which means they end up constant time, O(1), as the string grows



      This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.






      share|improve this answer
























        2












        2








        2






        A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s)) pass + constant time:



        def non_repeat(s):
        LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
        seen_order = # Store the order for figuring out which came first
        seen_set = set() # Store whether we've seen the character
        dupe_set = set() # Store whether we've seen it more than once

        # Scan the string
        for ch in s:
        chl = ch.lower() # lowered character for seen/dupe sets
        if chl not in seen_set:
        seen_order.append(ch) # This uses the non-lowered version to preserve case
        seen_set.add(chl)
        else:
        dupe_set.add(chl)
        if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
        return '' # If dupe set contains all possible characters, exit early

        # Find uniques
        unique_set = seen_set - dupe_set

        # Find the first one, if any
        if unique_set:
        for ch in seen_order:
        if ch.lower() in unique_set:
        return ch
        return ''


        Some notes on speed:




        • O(len(s)) average case, O(1) best case (see early exit) - to build the list/sets - set membership, additions and list appends are all average O(1) operations, worst case O(len(set/list))*


        • O(1) - Set difference on average, worst case O(len(set))*


        • O(len(list))* for the final check



        *O(len(list)) and O(len(set)) both have upper bounds of LEN_CHAR_SET_LOWERED, which means they end up constant time, O(1), as the string grows



        This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.






        share|improve this answer












        A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s)) pass + constant time:



        def non_repeat(s):
        LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
        seen_order = # Store the order for figuring out which came first
        seen_set = set() # Store whether we've seen the character
        dupe_set = set() # Store whether we've seen it more than once

        # Scan the string
        for ch in s:
        chl = ch.lower() # lowered character for seen/dupe sets
        if chl not in seen_set:
        seen_order.append(ch) # This uses the non-lowered version to preserve case
        seen_set.add(chl)
        else:
        dupe_set.add(chl)
        if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
        return '' # If dupe set contains all possible characters, exit early

        # Find uniques
        unique_set = seen_set - dupe_set

        # Find the first one, if any
        if unique_set:
        for ch in seen_order:
        if ch.lower() in unique_set:
        return ch
        return ''


        Some notes on speed:




        • O(len(s)) average case, O(1) best case (see early exit) - to build the list/sets - set membership, additions and list appends are all average O(1) operations, worst case O(len(set/list))*


        • O(1) - Set difference on average, worst case O(len(set))*


        • O(len(list))* for the final check



        *O(len(list)) and O(len(set)) both have upper bounds of LEN_CHAR_SET_LOWERED, which means they end up constant time, O(1), as the string grows



        This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 13 '18 at 23:32









        TemporalWolf

        25427




        25427






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209622%2ffirst-non-repeating-letter%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

            ComboBox Display Member on multiple fields

            Is it possible to collect Nectar points via Trainline?