Splitting a python string at a delimiter but a specific one
Is there a way to split a python string without using a for loop that basically splits a string in the middle to the closest delimiter.
Like:
The cat jumped over the moon very quickly.
The delimiter would be the space and the resulting strings would be:
The cat jumped over
the moon very quickly.
I see there is a count
where I can see how many spaces are in there (Don't see how to return their indexes though). I could then find the middle one by dividing by two, but then how to say split on this delimiter at this index. Find is close but it returns the first index (or right first index using rfind) not all the indexes where " " is found. I might be over thinking this.
python
add a comment |
Is there a way to split a python string without using a for loop that basically splits a string in the middle to the closest delimiter.
Like:
The cat jumped over the moon very quickly.
The delimiter would be the space and the resulting strings would be:
The cat jumped over
the moon very quickly.
I see there is a count
where I can see how many spaces are in there (Don't see how to return their indexes though). I could then find the middle one by dividing by two, but then how to say split on this delimiter at this index. Find is close but it returns the first index (or right first index using rfind) not all the indexes where " " is found. I might be over thinking this.
python
3
What about usingsplit()
and the re-joining the first and second half of the resulting list separately?
– Feodoran
Jan 21 at 23:07
2
The algorithm you define (counting the spaces) would split the sentence into equal number of words, which conflicts with your requirement (split a string in the middle to the closest delimiter). Which one are you after?
– Selcuk
Jan 21 at 23:19
add a comment |
Is there a way to split a python string without using a for loop that basically splits a string in the middle to the closest delimiter.
Like:
The cat jumped over the moon very quickly.
The delimiter would be the space and the resulting strings would be:
The cat jumped over
the moon very quickly.
I see there is a count
where I can see how many spaces are in there (Don't see how to return their indexes though). I could then find the middle one by dividing by two, but then how to say split on this delimiter at this index. Find is close but it returns the first index (or right first index using rfind) not all the indexes where " " is found. I might be over thinking this.
python
Is there a way to split a python string without using a for loop that basically splits a string in the middle to the closest delimiter.
Like:
The cat jumped over the moon very quickly.
The delimiter would be the space and the resulting strings would be:
The cat jumped over
the moon very quickly.
I see there is a count
where I can see how many spaces are in there (Don't see how to return their indexes though). I could then find the middle one by dividing by two, but then how to say split on this delimiter at this index. Find is close but it returns the first index (or right first index using rfind) not all the indexes where " " is found. I might be over thinking this.
python
python
asked Jan 21 at 23:01
CodejoyCodejoy
1,30183868
1,30183868
3
What about usingsplit()
and the re-joining the first and second half of the resulting list separately?
– Feodoran
Jan 21 at 23:07
2
The algorithm you define (counting the spaces) would split the sentence into equal number of words, which conflicts with your requirement (split a string in the middle to the closest delimiter). Which one are you after?
– Selcuk
Jan 21 at 23:19
add a comment |
3
What about usingsplit()
and the re-joining the first and second half of the resulting list separately?
– Feodoran
Jan 21 at 23:07
2
The algorithm you define (counting the spaces) would split the sentence into equal number of words, which conflicts with your requirement (split a string in the middle to the closest delimiter). Which one are you after?
– Selcuk
Jan 21 at 23:19
3
3
What about using
split()
and the re-joining the first and second half of the resulting list separately?– Feodoran
Jan 21 at 23:07
What about using
split()
and the re-joining the first and second half of the resulting list separately?– Feodoran
Jan 21 at 23:07
2
2
The algorithm you define (counting the spaces) would split the sentence into equal number of words, which conflicts with your requirement (split a string in the middle to the closest delimiter). Which one are you after?
– Selcuk
Jan 21 at 23:19
The algorithm you define (counting the spaces) would split the sentence into equal number of words, which conflicts with your requirement (split a string in the middle to the closest delimiter). Which one are you after?
– Selcuk
Jan 21 at 23:19
add a comment |
7 Answers
7
active
oldest
votes
how about something like this:
s = "The cat jumped over the moon very quickly"
l = s.split()
s1 = ' '.join(l[:len(l)//2])
s2 = ' '.join(l[len(l)//2 :])
print(s1)
print(s2)
This method will collapse consecutive spaces into a single space."First␣sentence.␣␣And␣then␣the␣second."
will split the string into"First␣sentence.␣And"
and"then␣the␣second."
. Notice the collapsed double space after the first sentence. Tabst
and newlinesn
will also be converted to a single space whenjoin
ed. Usings.split(" ")
will split on every individual space, which would maintain consecutive spaces whenjoin
ed, but you then have a problem when halving the split string.
– Billy Brown
Jan 22 at 8:54
This also breaks down for "sentence with looooooooooooooooooong word and many very short ones"
– tobias_k
Jan 22 at 9:35
add a comment |
This should work:
def split_text(text):
middle = len(text)//2
under = text.rfind(" ", 0, middle)
over = text.find(" ", middle)
if over > under and under != -1:
return (text[:,middle - under], text[middle - under,:])
else:
if over is -1:
raise ValueError("No separator found in text '{}'".format(text))
return (text[:,middle + over], text[middle + over,:])
it does not use a for loop, but probably using a for loop would have better performance.
I handle the case where the separator is not found in the whole string by raising an error, but change
raise ValueError()
for whatever way you want to handle that case.
2
You meanunder = text.rfind(" ", 0, middle)
.
– CristiFati
Jan 22 at 0:05
1
Right, I will edit it.
– spaniard
Jan 22 at 0:09
1
Algorithmically speaking, this is as efficient as it can get.
– Olivier Melançon
Jan 22 at 0:53
2
@spaniard Although, I think you have to handle the case where find and rfind will return -1
– Olivier Melançon
Jan 22 at 0:54
2
@OlivierMelançon right! thanks I will fix it.
– spaniard
Jan 22 at 1:04
add a comment |
You can use min
to find the closest space to the middle and then slice the string.
s = "The cat jumped over the moon very quickly."
mid = min((i for i, c in enumerate(s) if c == ' '), key=lambda i: abs(i - len(s) // 2))
fst, snd = s[:mid], s[mid+1:]
print(fst)
print(snd)
Output
The cat jumped over
the moon very quickly.
Not to nitpick, but thatmin
call consumes a generator (i.e., a loop)
– sapi
Jan 22 at 9:48
add a comment |
I'd just split then rejoin:
text = "The cat jumped over the moon very quickly"
words = text.split()
first_half = " ".join(words[:len(words)//2])
1
Depends on whether you want to split by # of words or overall string length.
– Amber
Jan 21 at 23:09
1
This split in equal amount of words, not characters
– Olivier Melançon
Jan 21 at 23:10
add a comment |
I think the solutions using split are good. I tried to solve it without split
and here's what I came up with.
sOdd = "The cat jumped over the moon very quickly."
sEven = "The cat jumped over the moon very quickly now."
def split_on_delim_mid(s, delim=" "):
delim_indexes = [
x[0] for x in enumerate(s) if x[1]==delim
] # [3, 7, 14, 19, 23, 28, 33]
# Select the correct number from delim_indexes
middle = len(delim_indexes)/2
if middle % 2 == 0:
middle_index = middle
else:
middle_index = (middle-.5)
# Return the separated sentances
sep = delim_indexes[int(middle_index)]
return s[:sep], s[sep:]
split_on_delim_mid(sOdd) # ('The cat jumped over', ' the moon very quickly.')
split_on_delim_mid(sEven) # ('The cat jumped over the', ' moon very quickly now.')
The idea here is to:
- Find the indexes of the deliminator.
- Find the median of that list of indexes
- Split on that.
add a comment |
Solutions with split()
and join()
are fine if you want to get half the words, not half the string (counting the characters and not the words). I think the latter is impossibile without a for
loop or a list comprehension (or an expensive workaround such a recursion to find the indexes of the spaces maybe).
But if you are fine with a list comprehension, you could do:
phrase = "The cat jumped over the moon very quickly."
#indexes of separator, here the ' '
sep_idxs = [i for i, j in enumerate(phrase) if j == ' ']
#getting the separator index closer to half the length of the string
sep = min(sep_idxs, key=lambda x:abs(x-(len(phrase) // 2)))
first_half = phrase[:sep]
last_half = phrase[sep+1:]
print([first_half, last_half])
Here first I look for the indexes of the separator with the list comprehension. Then I find the index of the closer separator to the half of the string using a custom key for the min() built-in function. Then split.
The print
statement prints ['The cat jumped over', 'the moon very quickly.']
add a comment |
As Valentino says, the answer depends on whether you want to split the number of characters as evenly as possible or the number of words as evenly as possible: split()
-based methods will do the latter.
Here's a way to do the former without looping or list comprehension. delim
can be any single character. This method just wouldn't work if you want a longer delimiter, since in that case it needn't be wholly in the first half or wholly in the second half.
def middlesplit(s,delim=" "):
if delim not in s:
return (s,)
midpoint=(len(s)+1)//2
left=s[:midpoint].rfind(delim)
right=s[:midpoint-1:-1].rfind(delim)
if right>left:
return (s[:-right-1],s[-right:])
else:
return (s[:left],s[left+1:])
The reason for using rfind()
rather than find()
is so that you can choose the larger result, making sure you avoid the -1
if only one side of your string contains delim
.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54298939%2fsplitting-a-python-string-at-a-delimiter-but-a-specific-one%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
how about something like this:
s = "The cat jumped over the moon very quickly"
l = s.split()
s1 = ' '.join(l[:len(l)//2])
s2 = ' '.join(l[len(l)//2 :])
print(s1)
print(s2)
This method will collapse consecutive spaces into a single space."First␣sentence.␣␣And␣then␣the␣second."
will split the string into"First␣sentence.␣And"
and"then␣the␣second."
. Notice the collapsed double space after the first sentence. Tabst
and newlinesn
will also be converted to a single space whenjoin
ed. Usings.split(" ")
will split on every individual space, which would maintain consecutive spaces whenjoin
ed, but you then have a problem when halving the split string.
– Billy Brown
Jan 22 at 8:54
This also breaks down for "sentence with looooooooooooooooooong word and many very short ones"
– tobias_k
Jan 22 at 9:35
add a comment |
how about something like this:
s = "The cat jumped over the moon very quickly"
l = s.split()
s1 = ' '.join(l[:len(l)//2])
s2 = ' '.join(l[len(l)//2 :])
print(s1)
print(s2)
This method will collapse consecutive spaces into a single space."First␣sentence.␣␣And␣then␣the␣second."
will split the string into"First␣sentence.␣And"
and"then␣the␣second."
. Notice the collapsed double space after the first sentence. Tabst
and newlinesn
will also be converted to a single space whenjoin
ed. Usings.split(" ")
will split on every individual space, which would maintain consecutive spaces whenjoin
ed, but you then have a problem when halving the split string.
– Billy Brown
Jan 22 at 8:54
This also breaks down for "sentence with looooooooooooooooooong word and many very short ones"
– tobias_k
Jan 22 at 9:35
add a comment |
how about something like this:
s = "The cat jumped over the moon very quickly"
l = s.split()
s1 = ' '.join(l[:len(l)//2])
s2 = ' '.join(l[len(l)//2 :])
print(s1)
print(s2)
how about something like this:
s = "The cat jumped over the moon very quickly"
l = s.split()
s1 = ' '.join(l[:len(l)//2])
s2 = ' '.join(l[len(l)//2 :])
print(s1)
print(s2)
answered Jan 21 at 23:10
LonelyDaoistLonelyDaoist
1247
1247
This method will collapse consecutive spaces into a single space."First␣sentence.␣␣And␣then␣the␣second."
will split the string into"First␣sentence.␣And"
and"then␣the␣second."
. Notice the collapsed double space after the first sentence. Tabst
and newlinesn
will also be converted to a single space whenjoin
ed. Usings.split(" ")
will split on every individual space, which would maintain consecutive spaces whenjoin
ed, but you then have a problem when halving the split string.
– Billy Brown
Jan 22 at 8:54
This also breaks down for "sentence with looooooooooooooooooong word and many very short ones"
– tobias_k
Jan 22 at 9:35
add a comment |
This method will collapse consecutive spaces into a single space."First␣sentence.␣␣And␣then␣the␣second."
will split the string into"First␣sentence.␣And"
and"then␣the␣second."
. Notice the collapsed double space after the first sentence. Tabst
and newlinesn
will also be converted to a single space whenjoin
ed. Usings.split(" ")
will split on every individual space, which would maintain consecutive spaces whenjoin
ed, but you then have a problem when halving the split string.
– Billy Brown
Jan 22 at 8:54
This also breaks down for "sentence with looooooooooooooooooong word and many very short ones"
– tobias_k
Jan 22 at 9:35
This method will collapse consecutive spaces into a single space.
"First␣sentence.␣␣And␣then␣the␣second."
will split the string into "First␣sentence.␣And"
and "then␣the␣second."
. Notice the collapsed double space after the first sentence. Tabs t
and newlines n
will also be converted to a single space when join
ed. Using s.split(" ")
will split on every individual space, which would maintain consecutive spaces when join
ed, but you then have a problem when halving the split string.– Billy Brown
Jan 22 at 8:54
This method will collapse consecutive spaces into a single space.
"First␣sentence.␣␣And␣then␣the␣second."
will split the string into "First␣sentence.␣And"
and "then␣the␣second."
. Notice the collapsed double space after the first sentence. Tabs t
and newlines n
will also be converted to a single space when join
ed. Using s.split(" ")
will split on every individual space, which would maintain consecutive spaces when join
ed, but you then have a problem when halving the split string.– Billy Brown
Jan 22 at 8:54
This also breaks down for "sentence with looooooooooooooooooong word and many very short ones"
– tobias_k
Jan 22 at 9:35
This also breaks down for "sentence with looooooooooooooooooong word and many very short ones"
– tobias_k
Jan 22 at 9:35
add a comment |
This should work:
def split_text(text):
middle = len(text)//2
under = text.rfind(" ", 0, middle)
over = text.find(" ", middle)
if over > under and under != -1:
return (text[:,middle - under], text[middle - under,:])
else:
if over is -1:
raise ValueError("No separator found in text '{}'".format(text))
return (text[:,middle + over], text[middle + over,:])
it does not use a for loop, but probably using a for loop would have better performance.
I handle the case where the separator is not found in the whole string by raising an error, but change
raise ValueError()
for whatever way you want to handle that case.
2
You meanunder = text.rfind(" ", 0, middle)
.
– CristiFati
Jan 22 at 0:05
1
Right, I will edit it.
– spaniard
Jan 22 at 0:09
1
Algorithmically speaking, this is as efficient as it can get.
– Olivier Melançon
Jan 22 at 0:53
2
@spaniard Although, I think you have to handle the case where find and rfind will return -1
– Olivier Melançon
Jan 22 at 0:54
2
@OlivierMelançon right! thanks I will fix it.
– spaniard
Jan 22 at 1:04
add a comment |
This should work:
def split_text(text):
middle = len(text)//2
under = text.rfind(" ", 0, middle)
over = text.find(" ", middle)
if over > under and under != -1:
return (text[:,middle - under], text[middle - under,:])
else:
if over is -1:
raise ValueError("No separator found in text '{}'".format(text))
return (text[:,middle + over], text[middle + over,:])
it does not use a for loop, but probably using a for loop would have better performance.
I handle the case where the separator is not found in the whole string by raising an error, but change
raise ValueError()
for whatever way you want to handle that case.
2
You meanunder = text.rfind(" ", 0, middle)
.
– CristiFati
Jan 22 at 0:05
1
Right, I will edit it.
– spaniard
Jan 22 at 0:09
1
Algorithmically speaking, this is as efficient as it can get.
– Olivier Melançon
Jan 22 at 0:53
2
@spaniard Although, I think you have to handle the case where find and rfind will return -1
– Olivier Melançon
Jan 22 at 0:54
2
@OlivierMelançon right! thanks I will fix it.
– spaniard
Jan 22 at 1:04
add a comment |
This should work:
def split_text(text):
middle = len(text)//2
under = text.rfind(" ", 0, middle)
over = text.find(" ", middle)
if over > under and under != -1:
return (text[:,middle - under], text[middle - under,:])
else:
if over is -1:
raise ValueError("No separator found in text '{}'".format(text))
return (text[:,middle + over], text[middle + over,:])
it does not use a for loop, but probably using a for loop would have better performance.
I handle the case where the separator is not found in the whole string by raising an error, but change
raise ValueError()
for whatever way you want to handle that case.
This should work:
def split_text(text):
middle = len(text)//2
under = text.rfind(" ", 0, middle)
over = text.find(" ", middle)
if over > under and under != -1:
return (text[:,middle - under], text[middle - under,:])
else:
if over is -1:
raise ValueError("No separator found in text '{}'".format(text))
return (text[:,middle + over], text[middle + over,:])
it does not use a for loop, but probably using a for loop would have better performance.
I handle the case where the separator is not found in the whole string by raising an error, but change
raise ValueError()
for whatever way you want to handle that case.
edited Jan 22 at 1:18
Olivier Melançon
13k11940
13k11940
answered Jan 22 at 0:02
spaniardspaniard
46729
46729
2
You meanunder = text.rfind(" ", 0, middle)
.
– CristiFati
Jan 22 at 0:05
1
Right, I will edit it.
– spaniard
Jan 22 at 0:09
1
Algorithmically speaking, this is as efficient as it can get.
– Olivier Melançon
Jan 22 at 0:53
2
@spaniard Although, I think you have to handle the case where find and rfind will return -1
– Olivier Melançon
Jan 22 at 0:54
2
@OlivierMelançon right! thanks I will fix it.
– spaniard
Jan 22 at 1:04
add a comment |
2
You meanunder = text.rfind(" ", 0, middle)
.
– CristiFati
Jan 22 at 0:05
1
Right, I will edit it.
– spaniard
Jan 22 at 0:09
1
Algorithmically speaking, this is as efficient as it can get.
– Olivier Melançon
Jan 22 at 0:53
2
@spaniard Although, I think you have to handle the case where find and rfind will return -1
– Olivier Melançon
Jan 22 at 0:54
2
@OlivierMelançon right! thanks I will fix it.
– spaniard
Jan 22 at 1:04
2
2
You mean
under = text.rfind(" ", 0, middle)
.– CristiFati
Jan 22 at 0:05
You mean
under = text.rfind(" ", 0, middle)
.– CristiFati
Jan 22 at 0:05
1
1
Right, I will edit it.
– spaniard
Jan 22 at 0:09
Right, I will edit it.
– spaniard
Jan 22 at 0:09
1
1
Algorithmically speaking, this is as efficient as it can get.
– Olivier Melançon
Jan 22 at 0:53
Algorithmically speaking, this is as efficient as it can get.
– Olivier Melançon
Jan 22 at 0:53
2
2
@spaniard Although, I think you have to handle the case where find and rfind will return -1
– Olivier Melançon
Jan 22 at 0:54
@spaniard Although, I think you have to handle the case where find and rfind will return -1
– Olivier Melançon
Jan 22 at 0:54
2
2
@OlivierMelançon right! thanks I will fix it.
– spaniard
Jan 22 at 1:04
@OlivierMelançon right! thanks I will fix it.
– spaniard
Jan 22 at 1:04
add a comment |
You can use min
to find the closest space to the middle and then slice the string.
s = "The cat jumped over the moon very quickly."
mid = min((i for i, c in enumerate(s) if c == ' '), key=lambda i: abs(i - len(s) // 2))
fst, snd = s[:mid], s[mid+1:]
print(fst)
print(snd)
Output
The cat jumped over
the moon very quickly.
Not to nitpick, but thatmin
call consumes a generator (i.e., a loop)
– sapi
Jan 22 at 9:48
add a comment |
You can use min
to find the closest space to the middle and then slice the string.
s = "The cat jumped over the moon very quickly."
mid = min((i for i, c in enumerate(s) if c == ' '), key=lambda i: abs(i - len(s) // 2))
fst, snd = s[:mid], s[mid+1:]
print(fst)
print(snd)
Output
The cat jumped over
the moon very quickly.
Not to nitpick, but thatmin
call consumes a generator (i.e., a loop)
– sapi
Jan 22 at 9:48
add a comment |
You can use min
to find the closest space to the middle and then slice the string.
s = "The cat jumped over the moon very quickly."
mid = min((i for i, c in enumerate(s) if c == ' '), key=lambda i: abs(i - len(s) // 2))
fst, snd = s[:mid], s[mid+1:]
print(fst)
print(snd)
Output
The cat jumped over
the moon very quickly.
You can use min
to find the closest space to the middle and then slice the string.
s = "The cat jumped over the moon very quickly."
mid = min((i for i, c in enumerate(s) if c == ' '), key=lambda i: abs(i - len(s) // 2))
fst, snd = s[:mid], s[mid+1:]
print(fst)
print(snd)
Output
The cat jumped over
the moon very quickly.
answered Jan 22 at 0:48
Olivier MelançonOlivier Melançon
13k11940
13k11940
Not to nitpick, but thatmin
call consumes a generator (i.e., a loop)
– sapi
Jan 22 at 9:48
add a comment |
Not to nitpick, but thatmin
call consumes a generator (i.e., a loop)
– sapi
Jan 22 at 9:48
Not to nitpick, but that
min
call consumes a generator (i.e., a loop)– sapi
Jan 22 at 9:48
Not to nitpick, but that
min
call consumes a generator (i.e., a loop)– sapi
Jan 22 at 9:48
add a comment |
I'd just split then rejoin:
text = "The cat jumped over the moon very quickly"
words = text.split()
first_half = " ".join(words[:len(words)//2])
1
Depends on whether you want to split by # of words or overall string length.
– Amber
Jan 21 at 23:09
1
This split in equal amount of words, not characters
– Olivier Melançon
Jan 21 at 23:10
add a comment |
I'd just split then rejoin:
text = "The cat jumped over the moon very quickly"
words = text.split()
first_half = " ".join(words[:len(words)//2])
1
Depends on whether you want to split by # of words or overall string length.
– Amber
Jan 21 at 23:09
1
This split in equal amount of words, not characters
– Olivier Melançon
Jan 21 at 23:10
add a comment |
I'd just split then rejoin:
text = "The cat jumped over the moon very quickly"
words = text.split()
first_half = " ".join(words[:len(words)//2])
I'd just split then rejoin:
text = "The cat jumped over the moon very quickly"
words = text.split()
first_half = " ".join(words[:len(words)//2])
answered Jan 21 at 23:08
Joe HalliwellJoe Halliwell
627317
627317
1
Depends on whether you want to split by # of words or overall string length.
– Amber
Jan 21 at 23:09
1
This split in equal amount of words, not characters
– Olivier Melançon
Jan 21 at 23:10
add a comment |
1
Depends on whether you want to split by # of words or overall string length.
– Amber
Jan 21 at 23:09
1
This split in equal amount of words, not characters
– Olivier Melançon
Jan 21 at 23:10
1
1
Depends on whether you want to split by # of words or overall string length.
– Amber
Jan 21 at 23:09
Depends on whether you want to split by # of words or overall string length.
– Amber
Jan 21 at 23:09
1
1
This split in equal amount of words, not characters
– Olivier Melançon
Jan 21 at 23:10
This split in equal amount of words, not characters
– Olivier Melançon
Jan 21 at 23:10
add a comment |
I think the solutions using split are good. I tried to solve it without split
and here's what I came up with.
sOdd = "The cat jumped over the moon very quickly."
sEven = "The cat jumped over the moon very quickly now."
def split_on_delim_mid(s, delim=" "):
delim_indexes = [
x[0] for x in enumerate(s) if x[1]==delim
] # [3, 7, 14, 19, 23, 28, 33]
# Select the correct number from delim_indexes
middle = len(delim_indexes)/2
if middle % 2 == 0:
middle_index = middle
else:
middle_index = (middle-.5)
# Return the separated sentances
sep = delim_indexes[int(middle_index)]
return s[:sep], s[sep:]
split_on_delim_mid(sOdd) # ('The cat jumped over', ' the moon very quickly.')
split_on_delim_mid(sEven) # ('The cat jumped over the', ' moon very quickly now.')
The idea here is to:
- Find the indexes of the deliminator.
- Find the median of that list of indexes
- Split on that.
add a comment |
I think the solutions using split are good. I tried to solve it without split
and here's what I came up with.
sOdd = "The cat jumped over the moon very quickly."
sEven = "The cat jumped over the moon very quickly now."
def split_on_delim_mid(s, delim=" "):
delim_indexes = [
x[0] for x in enumerate(s) if x[1]==delim
] # [3, 7, 14, 19, 23, 28, 33]
# Select the correct number from delim_indexes
middle = len(delim_indexes)/2
if middle % 2 == 0:
middle_index = middle
else:
middle_index = (middle-.5)
# Return the separated sentances
sep = delim_indexes[int(middle_index)]
return s[:sep], s[sep:]
split_on_delim_mid(sOdd) # ('The cat jumped over', ' the moon very quickly.')
split_on_delim_mid(sEven) # ('The cat jumped over the', ' moon very quickly now.')
The idea here is to:
- Find the indexes of the deliminator.
- Find the median of that list of indexes
- Split on that.
add a comment |
I think the solutions using split are good. I tried to solve it without split
and here's what I came up with.
sOdd = "The cat jumped over the moon very quickly."
sEven = "The cat jumped over the moon very quickly now."
def split_on_delim_mid(s, delim=" "):
delim_indexes = [
x[0] for x in enumerate(s) if x[1]==delim
] # [3, 7, 14, 19, 23, 28, 33]
# Select the correct number from delim_indexes
middle = len(delim_indexes)/2
if middle % 2 == 0:
middle_index = middle
else:
middle_index = (middle-.5)
# Return the separated sentances
sep = delim_indexes[int(middle_index)]
return s[:sep], s[sep:]
split_on_delim_mid(sOdd) # ('The cat jumped over', ' the moon very quickly.')
split_on_delim_mid(sEven) # ('The cat jumped over the', ' moon very quickly now.')
The idea here is to:
- Find the indexes of the deliminator.
- Find the median of that list of indexes
- Split on that.
I think the solutions using split are good. I tried to solve it without split
and here's what I came up with.
sOdd = "The cat jumped over the moon very quickly."
sEven = "The cat jumped over the moon very quickly now."
def split_on_delim_mid(s, delim=" "):
delim_indexes = [
x[0] for x in enumerate(s) if x[1]==delim
] # [3, 7, 14, 19, 23, 28, 33]
# Select the correct number from delim_indexes
middle = len(delim_indexes)/2
if middle % 2 == 0:
middle_index = middle
else:
middle_index = (middle-.5)
# Return the separated sentances
sep = delim_indexes[int(middle_index)]
return s[:sep], s[sep:]
split_on_delim_mid(sOdd) # ('The cat jumped over', ' the moon very quickly.')
split_on_delim_mid(sEven) # ('The cat jumped over the', ' moon very quickly now.')
The idea here is to:
- Find the indexes of the deliminator.
- Find the median of that list of indexes
- Split on that.
answered Jan 21 at 23:19
Charles LandauCharles Landau
2,3781216
2,3781216
add a comment |
add a comment |
Solutions with split()
and join()
are fine if you want to get half the words, not half the string (counting the characters and not the words). I think the latter is impossibile without a for
loop or a list comprehension (or an expensive workaround such a recursion to find the indexes of the spaces maybe).
But if you are fine with a list comprehension, you could do:
phrase = "The cat jumped over the moon very quickly."
#indexes of separator, here the ' '
sep_idxs = [i for i, j in enumerate(phrase) if j == ' ']
#getting the separator index closer to half the length of the string
sep = min(sep_idxs, key=lambda x:abs(x-(len(phrase) // 2)))
first_half = phrase[:sep]
last_half = phrase[sep+1:]
print([first_half, last_half])
Here first I look for the indexes of the separator with the list comprehension. Then I find the index of the closer separator to the half of the string using a custom key for the min() built-in function. Then split.
The print
statement prints ['The cat jumped over', 'the moon very quickly.']
add a comment |
Solutions with split()
and join()
are fine if you want to get half the words, not half the string (counting the characters and not the words). I think the latter is impossibile without a for
loop or a list comprehension (or an expensive workaround such a recursion to find the indexes of the spaces maybe).
But if you are fine with a list comprehension, you could do:
phrase = "The cat jumped over the moon very quickly."
#indexes of separator, here the ' '
sep_idxs = [i for i, j in enumerate(phrase) if j == ' ']
#getting the separator index closer to half the length of the string
sep = min(sep_idxs, key=lambda x:abs(x-(len(phrase) // 2)))
first_half = phrase[:sep]
last_half = phrase[sep+1:]
print([first_half, last_half])
Here first I look for the indexes of the separator with the list comprehension. Then I find the index of the closer separator to the half of the string using a custom key for the min() built-in function. Then split.
The print
statement prints ['The cat jumped over', 'the moon very quickly.']
add a comment |
Solutions with split()
and join()
are fine if you want to get half the words, not half the string (counting the characters and not the words). I think the latter is impossibile without a for
loop or a list comprehension (or an expensive workaround such a recursion to find the indexes of the spaces maybe).
But if you are fine with a list comprehension, you could do:
phrase = "The cat jumped over the moon very quickly."
#indexes of separator, here the ' '
sep_idxs = [i for i, j in enumerate(phrase) if j == ' ']
#getting the separator index closer to half the length of the string
sep = min(sep_idxs, key=lambda x:abs(x-(len(phrase) // 2)))
first_half = phrase[:sep]
last_half = phrase[sep+1:]
print([first_half, last_half])
Here first I look for the indexes of the separator with the list comprehension. Then I find the index of the closer separator to the half of the string using a custom key for the min() built-in function. Then split.
The print
statement prints ['The cat jumped over', 'the moon very quickly.']
Solutions with split()
and join()
are fine if you want to get half the words, not half the string (counting the characters and not the words). I think the latter is impossibile without a for
loop or a list comprehension (or an expensive workaround such a recursion to find the indexes of the spaces maybe).
But if you are fine with a list comprehension, you could do:
phrase = "The cat jumped over the moon very quickly."
#indexes of separator, here the ' '
sep_idxs = [i for i, j in enumerate(phrase) if j == ' ']
#getting the separator index closer to half the length of the string
sep = min(sep_idxs, key=lambda x:abs(x-(len(phrase) // 2)))
first_half = phrase[:sep]
last_half = phrase[sep+1:]
print([first_half, last_half])
Here first I look for the indexes of the separator with the list comprehension. Then I find the index of the closer separator to the half of the string using a custom key for the min() built-in function. Then split.
The print
statement prints ['The cat jumped over', 'the moon very quickly.']
answered Jan 21 at 23:57
ValentinoValentino
467129
467129
add a comment |
add a comment |
As Valentino says, the answer depends on whether you want to split the number of characters as evenly as possible or the number of words as evenly as possible: split()
-based methods will do the latter.
Here's a way to do the former without looping or list comprehension. delim
can be any single character. This method just wouldn't work if you want a longer delimiter, since in that case it needn't be wholly in the first half or wholly in the second half.
def middlesplit(s,delim=" "):
if delim not in s:
return (s,)
midpoint=(len(s)+1)//2
left=s[:midpoint].rfind(delim)
right=s[:midpoint-1:-1].rfind(delim)
if right>left:
return (s[:-right-1],s[-right:])
else:
return (s[:left],s[left+1:])
The reason for using rfind()
rather than find()
is so that you can choose the larger result, making sure you avoid the -1
if only one side of your string contains delim
.
add a comment |
As Valentino says, the answer depends on whether you want to split the number of characters as evenly as possible or the number of words as evenly as possible: split()
-based methods will do the latter.
Here's a way to do the former without looping or list comprehension. delim
can be any single character. This method just wouldn't work if you want a longer delimiter, since in that case it needn't be wholly in the first half or wholly in the second half.
def middlesplit(s,delim=" "):
if delim not in s:
return (s,)
midpoint=(len(s)+1)//2
left=s[:midpoint].rfind(delim)
right=s[:midpoint-1:-1].rfind(delim)
if right>left:
return (s[:-right-1],s[-right:])
else:
return (s[:left],s[left+1:])
The reason for using rfind()
rather than find()
is so that you can choose the larger result, making sure you avoid the -1
if only one side of your string contains delim
.
add a comment |
As Valentino says, the answer depends on whether you want to split the number of characters as evenly as possible or the number of words as evenly as possible: split()
-based methods will do the latter.
Here's a way to do the former without looping or list comprehension. delim
can be any single character. This method just wouldn't work if you want a longer delimiter, since in that case it needn't be wholly in the first half or wholly in the second half.
def middlesplit(s,delim=" "):
if delim not in s:
return (s,)
midpoint=(len(s)+1)//2
left=s[:midpoint].rfind(delim)
right=s[:midpoint-1:-1].rfind(delim)
if right>left:
return (s[:-right-1],s[-right:])
else:
return (s[:left],s[left+1:])
The reason for using rfind()
rather than find()
is so that you can choose the larger result, making sure you avoid the -1
if only one side of your string contains delim
.
As Valentino says, the answer depends on whether you want to split the number of characters as evenly as possible or the number of words as evenly as possible: split()
-based methods will do the latter.
Here's a way to do the former without looping or list comprehension. delim
can be any single character. This method just wouldn't work if you want a longer delimiter, since in that case it needn't be wholly in the first half or wholly in the second half.
def middlesplit(s,delim=" "):
if delim not in s:
return (s,)
midpoint=(len(s)+1)//2
left=s[:midpoint].rfind(delim)
right=s[:midpoint-1:-1].rfind(delim)
if right>left:
return (s[:-right-1],s[-right:])
else:
return (s[:left],s[left+1:])
The reason for using rfind()
rather than find()
is so that you can choose the larger result, making sure you avoid the -1
if only one side of your string contains delim
.
answered Jan 22 at 10:00
Especially LimeEspecially Lime
1112
1112
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54298939%2fsplitting-a-python-string-at-a-delimiter-but-a-specific-one%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
What about using
split()
and the re-joining the first and second half of the resulting list separately?– Feodoran
Jan 21 at 23:07
2
The algorithm you define (counting the spaces) would split the sentence into equal number of words, which conflicts with your requirement (split a string in the middle to the closest delimiter). Which one are you after?
– Selcuk
Jan 21 at 23:19