First non-repeating letter
The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.
My Solution:
def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''
How can I improve the readability of my solution? In particular, I don't like:
counts[letter][0]
because the 0 index is a bit ambiguous.- calculating the incremented value as another line,
incr_val
; it'd be nice to do the incrementing and updating of the dict in one line.
Anything else you could make more readable? I should add that I'm aware of collections.Counter
; for this solution I'd like to avoid using it.
python python-3.x
add a comment |
The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.
My Solution:
def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''
How can I improve the readability of my solution? In particular, I don't like:
counts[letter][0]
because the 0 index is a bit ambiguous.- calculating the incremented value as another line,
incr_val
; it'd be nice to do the incrementing and updating of the dict in one line.
Anything else you could make more readable? I should add that I'm aware of collections.Counter
; for this solution I'd like to avoid using it.
python python-3.x
5
I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11
1
Can you explain why you want to avoidcollections.Counter
?
– Gareth Rees
Dec 13 '18 at 19:50
There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14
it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04
add a comment |
The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.
My Solution:
def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''
How can I improve the readability of my solution? In particular, I don't like:
counts[letter][0]
because the 0 index is a bit ambiguous.- calculating the incremented value as another line,
incr_val
; it'd be nice to do the incrementing and updating of the dict in one line.
Anything else you could make more readable? I should add that I'm aware of collections.Counter
; for this solution I'd like to avoid using it.
python python-3.x
The problem: find and return the first nonrepeating character in a string, if the string does not have a nonrepeating character return an empty string. For purposes of finding the nonrepeating character it's case insensitive so 't' and 'T' appearing in the string would imply that that 't' and 'T' are not candidates for a nonrepeating character. When you return the character you must return it as its original case.
My Solution:
def non_repeat(_str):
counts = {}
for pos, letter in enumerate(_str.lower()):
if letter not in counts:
counts[letter] = (0, pos)
incr_val = counts[letter][0] + 1
counts[letter] = (incr_val, pos)
for letter in _str.lower():
if counts[letter][0] == 1:
return _str[counts[letter][1]]
return ''
How can I improve the readability of my solution? In particular, I don't like:
counts[letter][0]
because the 0 index is a bit ambiguous.- calculating the incremented value as another line,
incr_val
; it'd be nice to do the incrementing and updating of the dict in one line.
Anything else you could make more readable? I should add that I'm aware of collections.Counter
; for this solution I'd like to avoid using it.
python python-3.x
python python-3.x
edited Dec 13 '18 at 16:49
Toby Speight
23.4k639111
23.4k639111
asked Dec 13 '18 at 14:56
Kevin S
20317
20317
5
I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11
1
Can you explain why you want to avoidcollections.Counter
?
– Gareth Rees
Dec 13 '18 at 19:50
There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14
it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04
add a comment |
5
I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11
1
Can you explain why you want to avoidcollections.Counter
?
– Gareth Rees
Dec 13 '18 at 19:50
There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14
it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04
5
5
I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11
I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11
1
1
Can you explain why you want to avoid
collections.Counter
?– Gareth Rees
Dec 13 '18 at 19:50
Can you explain why you want to avoid
collections.Counter
?– Gareth Rees
Dec 13 '18 at 19:50
There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14
There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14
it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04
it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04
add a comment |
4 Answers
4
active
oldest
votes
IMHO looping twice over the input string is not acceptable for multiple reasons
- the string is of unknown size
- you invalidate your code for generators
While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).
import sys
assert sys.version_info >= (3, 6)
def non_repeat(s):
repeated = set()
candidates = dict()
for original_case in s:
lower_case = original_case.lower()
if lower_case not in repeated:
if lower_case not in candidates:
candidates[lower_case] = original_case
else:
repeated.add(lower_case)
del candidates[lower_case]
if candidates:
return next(iter(candidates.values()))
else:
return ''
This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.
Edit: generator example
Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.
def char_gen(f):
for line in f:
for c in line.strip():
yield c
with open('bigfile.txt') as f:
print(non_repeat(char_gen(f)))
also you might use the algorithm on a generator expression
print(non_repeat(c for c in "aabbcd" if c != 'c'))
You could also usecollections.OrderedDict
to make the dependence on the dictionary being ordered more explicit.
– JAD
Dec 14 '18 at 8:04
This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
– Kevin S
Dec 14 '18 at 16:21
Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
– phoog
Dec 14 '18 at 21:57
@phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
– stefan
Dec 15 '18 at 19:41
@stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
– phoog
Dec 15 '18 at 20:52
|
show 1 more comment
I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict
for counting and simplify the code and contribute to readability:
from collections import defaultdict
def non_repeat(input_string):
counts = defaultdict(int)
positions = {}
for position, letter in enumerate(input_string.lower()):
counts[letter] += 1
positions[letter] = position
for letter in input_string.lower():
if counts[letter] == 1:
return input_string[positions[letter]]
return ''
I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
– Kevin S
Dec 13 '18 at 15:44
2
We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
– Kevin S
Dec 13 '18 at 16:09
1
@KevinS yeah, this means that we don't really need theif letter not in initial_positions
check.
– alecxe
Dec 13 '18 at 16:13
1
Excuse me if I'm wrong, but wouldn'tinput_string[positions[letter]]
just be the same asletter
?
– JAD
Dec 14 '18 at 7:58
1
@JAD Iflower ()
wasn't called on the input string, it would. But in any case thepositions
dictionary is unnecessary as you can either lowercase each letter individually to check for its count or useenumerate
to retrieve the position.
– Mathias Ettinger
Dec 14 '18 at 8:29
|
show 8 more comments
Going further from alecxe's answer:
- you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.
- you can ensure
lower
is called only once
You'd get something like:
from collections import Counter
def non_repeat(input_string):
lower = input_string.lower()
count = Counter(lower)
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
or, for a Counter-less solution:
def non_repeat(input_string):
lower = input_string.lower()
count = defaultdict(int)
for c in lower:
count[c] += 1
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
Also, here's a quick test suite I wrote:
tests = [
('', ''),
('AA', ''),
('AAABBB', ''),
('AAABBBc', 'c'),
('azerty', 'a'),
('aazzeerty', 'r'),
('azerAZERty', 't'),
]
for inp, out in tests:
assert non_repeat(inp) == out
1
The question did specifically state that Kevin is avoidingcollections.Counter
, so that's not such a helpful suggestion. The rest makes sense.
– Toby Speight
Dec 13 '18 at 16:52
1
@TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
– Josay
Dec 13 '18 at 17:04
IsCounter
still slower thandefaultdict
?
– jpmc26
Dec 13 '18 at 22:47
I didn't know it had been the case. I'll try a benchmark if I think about it
– Josay
Dec 13 '18 at 22:53
add a comment |
A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s))
pass + constant time:
def non_repeat(s):
LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
seen_order = # Store the order for figuring out which came first
seen_set = set() # Store whether we've seen the character
dupe_set = set() # Store whether we've seen it more than once
# Scan the string
for ch in s:
chl = ch.lower() # lowered character for seen/dupe sets
if chl not in seen_set:
seen_order.append(ch) # This uses the non-lowered version to preserve case
seen_set.add(chl)
else:
dupe_set.add(chl)
if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
return '' # If dupe set contains all possible characters, exit early
# Find uniques
unique_set = seen_set - dupe_set
# Find the first one, if any
if unique_set:
for ch in seen_order:
if ch.lower() in unique_set:
return ch
return ''
Some notes on speed:
O(len(s))
average case,O(1)
best case (see early exit) - to build the list/sets - set membership, additions and list appends are all averageO(1)
operations, worst caseO(len(set/list))
*O(1)
- Set difference on average, worst caseO(len(set))
*O(len(list))
* for the final check
*O(len(list))
and O(len(set))
both have upper bounds of LEN_CHAR_SET_LOWERED
, which means they end up constant time, O(1)
, as the string grows
This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209622%2ffirst-non-repeating-letter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
IMHO looping twice over the input string is not acceptable for multiple reasons
- the string is of unknown size
- you invalidate your code for generators
While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).
import sys
assert sys.version_info >= (3, 6)
def non_repeat(s):
repeated = set()
candidates = dict()
for original_case in s:
lower_case = original_case.lower()
if lower_case not in repeated:
if lower_case not in candidates:
candidates[lower_case] = original_case
else:
repeated.add(lower_case)
del candidates[lower_case]
if candidates:
return next(iter(candidates.values()))
else:
return ''
This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.
Edit: generator example
Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.
def char_gen(f):
for line in f:
for c in line.strip():
yield c
with open('bigfile.txt') as f:
print(non_repeat(char_gen(f)))
also you might use the algorithm on a generator expression
print(non_repeat(c for c in "aabbcd" if c != 'c'))
You could also usecollections.OrderedDict
to make the dependence on the dictionary being ordered more explicit.
– JAD
Dec 14 '18 at 8:04
This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
– Kevin S
Dec 14 '18 at 16:21
Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
– phoog
Dec 14 '18 at 21:57
@phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
– stefan
Dec 15 '18 at 19:41
@stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
– phoog
Dec 15 '18 at 20:52
|
show 1 more comment
IMHO looping twice over the input string is not acceptable for multiple reasons
- the string is of unknown size
- you invalidate your code for generators
While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).
import sys
assert sys.version_info >= (3, 6)
def non_repeat(s):
repeated = set()
candidates = dict()
for original_case in s:
lower_case = original_case.lower()
if lower_case not in repeated:
if lower_case not in candidates:
candidates[lower_case] = original_case
else:
repeated.add(lower_case)
del candidates[lower_case]
if candidates:
return next(iter(candidates.values()))
else:
return ''
This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.
Edit: generator example
Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.
def char_gen(f):
for line in f:
for c in line.strip():
yield c
with open('bigfile.txt') as f:
print(non_repeat(char_gen(f)))
also you might use the algorithm on a generator expression
print(non_repeat(c for c in "aabbcd" if c != 'c'))
You could also usecollections.OrderedDict
to make the dependence on the dictionary being ordered more explicit.
– JAD
Dec 14 '18 at 8:04
This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
– Kevin S
Dec 14 '18 at 16:21
Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
– phoog
Dec 14 '18 at 21:57
@phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
– stefan
Dec 15 '18 at 19:41
@stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
– phoog
Dec 15 '18 at 20:52
|
show 1 more comment
IMHO looping twice over the input string is not acceptable for multiple reasons
- the string is of unknown size
- you invalidate your code for generators
While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).
import sys
assert sys.version_info >= (3, 6)
def non_repeat(s):
repeated = set()
candidates = dict()
for original_case in s:
lower_case = original_case.lower()
if lower_case not in repeated:
if lower_case not in candidates:
candidates[lower_case] = original_case
else:
repeated.add(lower_case)
del candidates[lower_case]
if candidates:
return next(iter(candidates.values()))
else:
return ''
This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.
Edit: generator example
Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.
def char_gen(f):
for line in f:
for c in line.strip():
yield c
with open('bigfile.txt') as f:
print(non_repeat(char_gen(f)))
also you might use the algorithm on a generator expression
print(non_repeat(c for c in "aabbcd" if c != 'c'))
IMHO looping twice over the input string is not acceptable for multiple reasons
- the string is of unknown size
- you invalidate your code for generators
While this might not be necessary for your project you should learn to think like that. So a single pass algorithm should collect all necessary data to answer the question (e. g. the letter in original case).
import sys
assert sys.version_info >= (3, 6)
def non_repeat(s):
repeated = set()
candidates = dict()
for original_case in s:
lower_case = original_case.lower()
if lower_case not in repeated:
if lower_case not in candidates:
candidates[lower_case] = original_case
else:
repeated.add(lower_case)
del candidates[lower_case]
if candidates:
return next(iter(candidates.values()))
else:
return ''
This code makes use of the insertion order of a dict which is already implemented in 3.6 and guaranteed in 3.7.
Edit: generator example
Say you want to check a big file that does not fit into memory (for brevity I assume a line fits into memory). Yor write a little character generator and run your algorithm on the generator.
def char_gen(f):
for line in f:
for c in line.strip():
yield c
with open('bigfile.txt') as f:
print(non_repeat(char_gen(f)))
also you might use the algorithm on a generator expression
print(non_repeat(c for c in "aabbcd" if c != 'c'))
edited Dec 14 '18 at 18:29
answered Dec 13 '18 at 19:12
stefan
1,531211
1,531211
You could also usecollections.OrderedDict
to make the dependence on the dictionary being ordered more explicit.
– JAD
Dec 14 '18 at 8:04
This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
– Kevin S
Dec 14 '18 at 16:21
Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
– phoog
Dec 14 '18 at 21:57
@phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
– stefan
Dec 15 '18 at 19:41
@stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
– phoog
Dec 15 '18 at 20:52
|
show 1 more comment
You could also usecollections.OrderedDict
to make the dependence on the dictionary being ordered more explicit.
– JAD
Dec 14 '18 at 8:04
This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
– Kevin S
Dec 14 '18 at 16:21
Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
– phoog
Dec 14 '18 at 21:57
@phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
– stefan
Dec 15 '18 at 19:41
@stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
– phoog
Dec 15 '18 at 20:52
You could also use
collections.OrderedDict
to make the dependence on the dictionary being ordered more explicit.– JAD
Dec 14 '18 at 8:04
You could also use
collections.OrderedDict
to make the dependence on the dictionary being ordered more explicit.– JAD
Dec 14 '18 at 8:04
This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
– Kevin S
Dec 14 '18 at 16:21
This could be leveraged on versions of python before 3.6 by importing OrderedDict from the collections module and then using an OrderedDict for candidates instead of dict. I like this solution for readability. Can you expand upon your statement "you invalidate your code for generators"
– Kevin S
Dec 14 '18 at 16:21
Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
– phoog
Dec 14 '18 at 21:57
Would it make sense to check the length of the string for zero or 1 and just return the argument in that case, to avoid instantiating the collections?
– phoog
Dec 14 '18 at 21:57
@phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
– stefan
Dec 15 '18 at 19:41
@phoog I would not introduce complexity unless it is needed. Every extra line may introduce an error. Optimisation is not required for small data but for big data.
– stefan
Dec 15 '18 at 19:41
@stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
– phoog
Dec 15 '18 at 20:52
@stefan thanks for your comment. I'm coming to this as a .NET programmer whose employer appears to be preparing for a push to increase the use of python. In the .NET world, for a high throughput scenario with a large volume of strings and a high proportion of empty or one-character strings, the reduced garbage collection pressure might indeed be desirable. Would similar considerations not apply here, at least for that use case?
– phoog
Dec 15 '18 at 20:52
|
show 1 more comment
I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict
for counting and simplify the code and contribute to readability:
from collections import defaultdict
def non_repeat(input_string):
counts = defaultdict(int)
positions = {}
for position, letter in enumerate(input_string.lower()):
counts[letter] += 1
positions[letter] = position
for letter in input_string.lower():
if counts[letter] == 1:
return input_string[positions[letter]]
return ''
I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
– Kevin S
Dec 13 '18 at 15:44
2
We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
– Kevin S
Dec 13 '18 at 16:09
1
@KevinS yeah, this means that we don't really need theif letter not in initial_positions
check.
– alecxe
Dec 13 '18 at 16:13
1
Excuse me if I'm wrong, but wouldn'tinput_string[positions[letter]]
just be the same asletter
?
– JAD
Dec 14 '18 at 7:58
1
@JAD Iflower ()
wasn't called on the input string, it would. But in any case thepositions
dictionary is unnecessary as you can either lowercase each letter individually to check for its count or useenumerate
to retrieve the position.
– Mathias Ettinger
Dec 14 '18 at 8:29
|
show 8 more comments
I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict
for counting and simplify the code and contribute to readability:
from collections import defaultdict
def non_repeat(input_string):
counts = defaultdict(int)
positions = {}
for position, letter in enumerate(input_string.lower()):
counts[letter] += 1
positions[letter] = position
for letter in input_string.lower():
if counts[letter] == 1:
return input_string[positions[letter]]
return ''
I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
– Kevin S
Dec 13 '18 at 15:44
2
We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
– Kevin S
Dec 13 '18 at 16:09
1
@KevinS yeah, this means that we don't really need theif letter not in initial_positions
check.
– alecxe
Dec 13 '18 at 16:13
1
Excuse me if I'm wrong, but wouldn'tinput_string[positions[letter]]
just be the same asletter
?
– JAD
Dec 14 '18 at 7:58
1
@JAD Iflower ()
wasn't called on the input string, it would. But in any case thepositions
dictionary is unnecessary as you can either lowercase each letter individually to check for its count or useenumerate
to retrieve the position.
– Mathias Ettinger
Dec 14 '18 at 8:29
|
show 8 more comments
I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict
for counting and simplify the code and contribute to readability:
from collections import defaultdict
def non_repeat(input_string):
counts = defaultdict(int)
positions = {}
for position, letter in enumerate(input_string.lower()):
counts[letter] += 1
positions[letter] = position
for letter in input_string.lower():
if counts[letter] == 1:
return input_string[positions[letter]]
return ''
I would separate counting from keeping the initial positions. This would allow to use collections.defaultdict
for counting and simplify the code and contribute to readability:
from collections import defaultdict
def non_repeat(input_string):
counts = defaultdict(int)
positions = {}
for position, letter in enumerate(input_string.lower()):
counts[letter] += 1
positions[letter] = position
for letter in input_string.lower():
if counts[letter] == 1:
return input_string[positions[letter]]
return ''
edited Dec 13 '18 at 16:14
answered Dec 13 '18 at 15:31
alecxe
14.9k53478
14.9k53478
I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
– Kevin S
Dec 13 '18 at 15:44
2
We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
– Kevin S
Dec 13 '18 at 16:09
1
@KevinS yeah, this means that we don't really need theif letter not in initial_positions
check.
– alecxe
Dec 13 '18 at 16:13
1
Excuse me if I'm wrong, but wouldn'tinput_string[positions[letter]]
just be the same asletter
?
– JAD
Dec 14 '18 at 7:58
1
@JAD Iflower ()
wasn't called on the input string, it would. But in any case thepositions
dictionary is unnecessary as you can either lowercase each letter individually to check for its count or useenumerate
to retrieve the position.
– Mathias Ettinger
Dec 14 '18 at 8:29
|
show 8 more comments
I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
– Kevin S
Dec 13 '18 at 15:44
2
We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
– Kevin S
Dec 13 '18 at 16:09
1
@KevinS yeah, this means that we don't really need theif letter not in initial_positions
check.
– alecxe
Dec 13 '18 at 16:13
1
Excuse me if I'm wrong, but wouldn'tinput_string[positions[letter]]
just be the same asletter
?
– JAD
Dec 14 '18 at 7:58
1
@JAD Iflower ()
wasn't called on the input string, it would. But in any case thepositions
dictionary is unnecessary as you can either lowercase each letter individually to check for its count or useenumerate
to retrieve the position.
– Mathias Ettinger
Dec 14 '18 at 8:29
I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
– Kevin S
Dec 13 '18 at 15:44
I think you're on the right track. What if a defaultdict(int) was also used for initial_positions? This simplifies your line "if letter not in initial_positions".
– Kevin S
Dec 13 '18 at 15:44
2
2
We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
– Kevin S
Dec 13 '18 at 16:09
We only care about characters that have one occurrence, and in particular we only care about the first character that has one occurrence. As a result the initial position for the character we care about will be correct, the others won't, but we don't care that they're not accurate, they're not what we're after
– Kevin S
Dec 13 '18 at 16:09
1
1
@KevinS yeah, this means that we don't really need the
if letter not in initial_positions
check.– alecxe
Dec 13 '18 at 16:13
@KevinS yeah, this means that we don't really need the
if letter not in initial_positions
check.– alecxe
Dec 13 '18 at 16:13
1
1
Excuse me if I'm wrong, but wouldn't
input_string[positions[letter]]
just be the same as letter
?– JAD
Dec 14 '18 at 7:58
Excuse me if I'm wrong, but wouldn't
input_string[positions[letter]]
just be the same as letter
?– JAD
Dec 14 '18 at 7:58
1
1
@JAD If
lower ()
wasn't called on the input string, it would. But in any case the positions
dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate
to retrieve the position.– Mathias Ettinger
Dec 14 '18 at 8:29
@JAD If
lower ()
wasn't called on the input string, it would. But in any case the positions
dictionary is unnecessary as you can either lowercase each letter individually to check for its count or use enumerate
to retrieve the position.– Mathias Ettinger
Dec 14 '18 at 8:29
|
show 8 more comments
Going further from alecxe's answer:
- you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.
- you can ensure
lower
is called only once
You'd get something like:
from collections import Counter
def non_repeat(input_string):
lower = input_string.lower()
count = Counter(lower)
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
or, for a Counter-less solution:
def non_repeat(input_string):
lower = input_string.lower()
count = defaultdict(int)
for c in lower:
count[c] += 1
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
Also, here's a quick test suite I wrote:
tests = [
('', ''),
('AA', ''),
('AAABBB', ''),
('AAABBBc', 'c'),
('azerty', 'a'),
('aazzeerty', 'r'),
('azerAZERty', 't'),
]
for inp, out in tests:
assert non_repeat(inp) == out
1
The question did specifically state that Kevin is avoidingcollections.Counter
, so that's not such a helpful suggestion. The rest makes sense.
– Toby Speight
Dec 13 '18 at 16:52
1
@TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
– Josay
Dec 13 '18 at 17:04
IsCounter
still slower thandefaultdict
?
– jpmc26
Dec 13 '18 at 22:47
I didn't know it had been the case. I'll try a benchmark if I think about it
– Josay
Dec 13 '18 at 22:53
add a comment |
Going further from alecxe's answer:
- you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.
- you can ensure
lower
is called only once
You'd get something like:
from collections import Counter
def non_repeat(input_string):
lower = input_string.lower()
count = Counter(lower)
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
or, for a Counter-less solution:
def non_repeat(input_string):
lower = input_string.lower()
count = defaultdict(int)
for c in lower:
count[c] += 1
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
Also, here's a quick test suite I wrote:
tests = [
('', ''),
('AA', ''),
('AAABBB', ''),
('AAABBBc', 'c'),
('azerty', 'a'),
('aazzeerty', 'r'),
('azerAZERty', 't'),
]
for inp, out in tests:
assert non_repeat(inp) == out
1
The question did specifically state that Kevin is avoidingcollections.Counter
, so that's not such a helpful suggestion. The rest makes sense.
– Toby Speight
Dec 13 '18 at 16:52
1
@TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
– Josay
Dec 13 '18 at 17:04
IsCounter
still slower thandefaultdict
?
– jpmc26
Dec 13 '18 at 22:47
I didn't know it had been the case. I'll try a benchmark if I think about it
– Josay
Dec 13 '18 at 22:53
add a comment |
Going further from alecxe's answer:
- you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.
- you can ensure
lower
is called only once
You'd get something like:
from collections import Counter
def non_repeat(input_string):
lower = input_string.lower()
count = Counter(lower)
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
or, for a Counter-less solution:
def non_repeat(input_string):
lower = input_string.lower()
count = defaultdict(int)
for c in lower:
count[c] += 1
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
Also, here's a quick test suite I wrote:
tests = [
('', ''),
('AA', ''),
('AAABBB', ''),
('AAABBBc', 'c'),
('azerty', 'a'),
('aazzeerty', 'r'),
('azerAZERty', 't'),
]
for inp, out in tests:
assert non_repeat(inp) == out
Going further from alecxe's answer:
- you could use the Counter collections instead of performing the counting yourself - I didn't see that this was avoided on purpose.
- you can ensure
lower
is called only once
You'd get something like:
from collections import Counter
def non_repeat(input_string):
lower = input_string.lower()
count = Counter(lower)
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
or, for a Counter-less solution:
def non_repeat(input_string):
lower = input_string.lower()
count = defaultdict(int)
for c in lower:
count[c] += 1
for c, l in zip(input_string, lower):
if count[l] == 1:
return c
return ''
Also, here's a quick test suite I wrote:
tests = [
('', ''),
('AA', ''),
('AAABBB', ''),
('AAABBBc', 'c'),
('azerty', 'a'),
('aazzeerty', 'r'),
('azerAZERty', 't'),
]
for inp, out in tests:
assert non_repeat(inp) == out
edited Dec 13 '18 at 17:04
answered Dec 13 '18 at 16:47
Josay
25.5k13987
25.5k13987
1
The question did specifically state that Kevin is avoidingcollections.Counter
, so that's not such a helpful suggestion. The rest makes sense.
– Toby Speight
Dec 13 '18 at 16:52
1
@TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
– Josay
Dec 13 '18 at 17:04
IsCounter
still slower thandefaultdict
?
– jpmc26
Dec 13 '18 at 22:47
I didn't know it had been the case. I'll try a benchmark if I think about it
– Josay
Dec 13 '18 at 22:53
add a comment |
1
The question did specifically state that Kevin is avoidingcollections.Counter
, so that's not such a helpful suggestion. The rest makes sense.
– Toby Speight
Dec 13 '18 at 16:52
1
@TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
– Josay
Dec 13 '18 at 17:04
IsCounter
still slower thandefaultdict
?
– jpmc26
Dec 13 '18 at 22:47
I didn't know it had been the case. I'll try a benchmark if I think about it
– Josay
Dec 13 '18 at 22:53
1
1
The question did specifically state that Kevin is avoiding
collections.Counter
, so that's not such a helpful suggestion. The rest makes sense.– Toby Speight
Dec 13 '18 at 16:52
The question did specifically state that Kevin is avoiding
collections.Counter
, so that's not such a helpful suggestion. The rest makes sense.– Toby Speight
Dec 13 '18 at 16:52
1
1
@TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
– Josay
Dec 13 '18 at 17:04
@TobySpeight Thanks for spotting this, I completely missed in the the original question. I've updated my answer accordingly but it is not that relevant anymore :)
– Josay
Dec 13 '18 at 17:04
Is
Counter
still slower than defaultdict
?– jpmc26
Dec 13 '18 at 22:47
Is
Counter
still slower than defaultdict
?– jpmc26
Dec 13 '18 at 22:47
I didn't know it had been the case. I'll try a benchmark if I think about it
– Josay
Dec 13 '18 at 22:53
I didn't know it had been the case. I'll try a benchmark if I think about it
– Josay
Dec 13 '18 at 22:53
add a comment |
A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s))
pass + constant time:
def non_repeat(s):
LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
seen_order = # Store the order for figuring out which came first
seen_set = set() # Store whether we've seen the character
dupe_set = set() # Store whether we've seen it more than once
# Scan the string
for ch in s:
chl = ch.lower() # lowered character for seen/dupe sets
if chl not in seen_set:
seen_order.append(ch) # This uses the non-lowered version to preserve case
seen_set.add(chl)
else:
dupe_set.add(chl)
if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
return '' # If dupe set contains all possible characters, exit early
# Find uniques
unique_set = seen_set - dupe_set
# Find the first one, if any
if unique_set:
for ch in seen_order:
if ch.lower() in unique_set:
return ch
return ''
Some notes on speed:
O(len(s))
average case,O(1)
best case (see early exit) - to build the list/sets - set membership, additions and list appends are all averageO(1)
operations, worst caseO(len(set/list))
*O(1)
- Set difference on average, worst caseO(len(set))
*O(len(list))
* for the final check
*O(len(list))
and O(len(set))
both have upper bounds of LEN_CHAR_SET_LOWERED
, which means they end up constant time, O(1)
, as the string grows
This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.
add a comment |
A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s))
pass + constant time:
def non_repeat(s):
LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
seen_order = # Store the order for figuring out which came first
seen_set = set() # Store whether we've seen the character
dupe_set = set() # Store whether we've seen it more than once
# Scan the string
for ch in s:
chl = ch.lower() # lowered character for seen/dupe sets
if chl not in seen_set:
seen_order.append(ch) # This uses the non-lowered version to preserve case
seen_set.add(chl)
else:
dupe_set.add(chl)
if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
return '' # If dupe set contains all possible characters, exit early
# Find uniques
unique_set = seen_set - dupe_set
# Find the first one, if any
if unique_set:
for ch in seen_order:
if ch.lower() in unique_set:
return ch
return ''
Some notes on speed:
O(len(s))
average case,O(1)
best case (see early exit) - to build the list/sets - set membership, additions and list appends are all averageO(1)
operations, worst caseO(len(set/list))
*O(1)
- Set difference on average, worst caseO(len(set))
*O(len(list))
* for the final check
*O(len(list))
and O(len(set))
both have upper bounds of LEN_CHAR_SET_LOWERED
, which means they end up constant time, O(1)
, as the string grows
This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.
add a comment |
A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s))
pass + constant time:
def non_repeat(s):
LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
seen_order = # Store the order for figuring out which came first
seen_set = set() # Store whether we've seen the character
dupe_set = set() # Store whether we've seen it more than once
# Scan the string
for ch in s:
chl = ch.lower() # lowered character for seen/dupe sets
if chl not in seen_set:
seen_order.append(ch) # This uses the non-lowered version to preserve case
seen_set.add(chl)
else:
dupe_set.add(chl)
if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
return '' # If dupe set contains all possible characters, exit early
# Find uniques
unique_set = seen_set - dupe_set
# Find the first one, if any
if unique_set:
for ch in seen_order:
if ch.lower() in unique_set:
return ch
return ''
Some notes on speed:
O(len(s))
average case,O(1)
best case (see early exit) - to build the list/sets - set membership, additions and list appends are all averageO(1)
operations, worst caseO(len(set/list))
*O(1)
- Set difference on average, worst caseO(len(set))
*O(len(list))
* for the final check
*O(len(list))
and O(len(set))
both have upper bounds of LEN_CHAR_SET_LOWERED
, which means they end up constant time, O(1)
, as the string grows
This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.
A version which doesn't use any imports, works on Py2.7+ and relies almost solely on set operations to achieve a single O(len(s))
pass + constant time:
def non_repeat(s):
LEN_CHAR_SET_LOWERED = XXX # length of your char set adjusted for .lower()
seen_order = # Store the order for figuring out which came first
seen_set = set() # Store whether we've seen the character
dupe_set = set() # Store whether we've seen it more than once
# Scan the string
for ch in s:
chl = ch.lower() # lowered character for seen/dupe sets
if chl not in seen_set:
seen_order.append(ch) # This uses the non-lowered version to preserve case
seen_set.add(chl)
else:
dupe_set.add(chl)
if len(dupe_set) == LEN_CHAR_SET_LOWERED: # Set len is O(1)
return '' # If dupe set contains all possible characters, exit early
# Find uniques
unique_set = seen_set - dupe_set
# Find the first one, if any
if unique_set:
for ch in seen_order:
if ch.lower() in unique_set:
return ch
return ''
Some notes on speed:
O(len(s))
average case,O(1)
best case (see early exit) - to build the list/sets - set membership, additions and list appends are all averageO(1)
operations, worst caseO(len(set/list))
*O(1)
- Set difference on average, worst caseO(len(set))
*O(len(list))
* for the final check
*O(len(list))
and O(len(set))
both have upper bounds of LEN_CHAR_SET_LOWERED
, which means they end up constant time, O(1)
, as the string grows
This is also interesting because of the early exit: If your string contains all characters duplicated, it will only scan until it has seen every character at least twice and then exit, knowing there will be no unique characters. An alphanumeric string could exit after scanning as few as 72 characters, regardless of the actual length.
answered Dec 13 '18 at 23:32
TemporalWolf
25427
25427
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209622%2ffirst-non-repeating-letter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
I rolled back your last edit. After getting an answer you are not allowed to change your code anymore. This is to ensure that answers do not get invalidated and have to hit a moving target. If you have changed your code you can either post it as an answer (if it would constitute a code review) or ask a new question with your changed code (linking back to this one as reference). Refer to this post for more information
– Sᴀᴍ Onᴇᴌᴀ
Dec 13 '18 at 16:11
1
Can you explain why you want to avoid
collections.Counter
?– Gareth Rees
Dec 13 '18 at 19:50
There's no logical reason for avoiding collections.Counter (that I'm aware of). I wrote my question that way because I wanted to keep the presentation of the algorithm more apparent to the reader, and because I already knew how to make the function much shorter by using collections.Counter which would essentially eliminate the 3-4 lines from the function.
– Kevin S
Dec 13 '18 at 21:14
it might be easier to iterate over the string backwards
– sudo rm -rf slash
Dec 14 '18 at 7:04