Filter different identical characters in multiple words
I have a very large wordlist. How can I use Unix (or possibly Python) to find instances of multiple words fitting specific character-sharing criteria? For example, I want Words 1 and 2 to have the same fourth and seventh characters, Words 2 and 3 to have the same fourth and ninth characters, and Words 3 and 4 to have the same second, fourth, and ninth characters.
Example:
aaadiigjlf
abcdefghij
aswdofflle
bbbbbbbbbb
bisofmlwpa
fsbdfopkld
gikfkwpspa
hogkellgis
might return
abcdefghij
aaadiigjlf
fsbdfopkld
aswdofflle
EDIT: For clarification, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return words that don't fit ALL of the criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth. With the program I'm running in its finished form, I'm expecting it to return a very small list of words (probably only ten) based on nine strict character-sharing criteria.
command-line
add a comment |
I have a very large wordlist. How can I use Unix (or possibly Python) to find instances of multiple words fitting specific character-sharing criteria? For example, I want Words 1 and 2 to have the same fourth and seventh characters, Words 2 and 3 to have the same fourth and ninth characters, and Words 3 and 4 to have the same second, fourth, and ninth characters.
Example:
aaadiigjlf
abcdefghij
aswdofflle
bbbbbbbbbb
bisofmlwpa
fsbdfopkld
gikfkwpspa
hogkellgis
might return
abcdefghij
aaadiigjlf
fsbdfopkld
aswdofflle
EDIT: For clarification, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return words that don't fit ALL of the criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth. With the program I'm running in its finished form, I'm expecting it to return a very small list of words (probably only ten) based on nine strict character-sharing criteria.
command-line
add a comment |
I have a very large wordlist. How can I use Unix (or possibly Python) to find instances of multiple words fitting specific character-sharing criteria? For example, I want Words 1 and 2 to have the same fourth and seventh characters, Words 2 and 3 to have the same fourth and ninth characters, and Words 3 and 4 to have the same second, fourth, and ninth characters.
Example:
aaadiigjlf
abcdefghij
aswdofflle
bbbbbbbbbb
bisofmlwpa
fsbdfopkld
gikfkwpspa
hogkellgis
might return
abcdefghij
aaadiigjlf
fsbdfopkld
aswdofflle
EDIT: For clarification, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return words that don't fit ALL of the criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth. With the program I'm running in its finished form, I'm expecting it to return a very small list of words (probably only ten) based on nine strict character-sharing criteria.
command-line
I have a very large wordlist. How can I use Unix (or possibly Python) to find instances of multiple words fitting specific character-sharing criteria? For example, I want Words 1 and 2 to have the same fourth and seventh characters, Words 2 and 3 to have the same fourth and ninth characters, and Words 3 and 4 to have the same second, fourth, and ninth characters.
Example:
aaadiigjlf
abcdefghij
aswdofflle
bbbbbbbbbb
bisofmlwpa
fsbdfopkld
gikfkwpspa
hogkellgis
might return
abcdefghij
aaadiigjlf
fsbdfopkld
aswdofflle
EDIT: For clarification, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return words that don't fit ALL of the criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth. With the program I'm running in its finished form, I'm expecting it to return a very small list of words (probably only ten) based on nine strict character-sharing criteria.
command-line
command-line
edited Jan 29 at 23:09
J.T.
asked Jan 29 at 22:16
J.T.J.T.
11
11
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Use grep
which uses Regular Expressions:
# Find all lines where the fourth and seventh letter are "d" and "g"
grep '...d..g' somefile
# Find all lines where the fourth and ninth letters are "d" and "l"
grep '...d....l' somefile
If you want to enforce both rules, you would chain them together using a pipe:
grep '...d..g' somefile | grep '...d....l'
You can reduce the verbosity of a regex and multiple dots using the syntax {123}
instead of 123 dots, such as:
egrep '.{3}d.{2}g' somefile
Note that as your regular expression gets more complicated you may need to use the egrep
to support some syntax, such as the repetition syntax above.
Sorry, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return multiple words that don't fit the same criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth.
– J.T.
Jan 29 at 23:03
That's more complicated and likely would need to be done with a real programming language such as Python. It may be possible withawk
but overall I can't think of a (clean) "unix" way to do that.
– Kristopher Ives
Jan 29 at 23:10
Hmm...then should I re-ask this in a Python forum, or would I still be able to ask here?
– J.T.
Jan 29 at 23:13
It's possible someone has a wizardly way of doing it that I'm not aware of, so I'm interested if anyone here can solve it. You might also want to post the exact question on Unix Stack Exchange as well as Stack Overflow
– Kristopher Ives
Jan 29 at 23:14
All right, I'll try crossposting. Thanks!
– J.T.
Jan 29 at 23:19
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1113945%2ffilter-different-identical-characters-in-multiple-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use grep
which uses Regular Expressions:
# Find all lines where the fourth and seventh letter are "d" and "g"
grep '...d..g' somefile
# Find all lines where the fourth and ninth letters are "d" and "l"
grep '...d....l' somefile
If you want to enforce both rules, you would chain them together using a pipe:
grep '...d..g' somefile | grep '...d....l'
You can reduce the verbosity of a regex and multiple dots using the syntax {123}
instead of 123 dots, such as:
egrep '.{3}d.{2}g' somefile
Note that as your regular expression gets more complicated you may need to use the egrep
to support some syntax, such as the repetition syntax above.
Sorry, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return multiple words that don't fit the same criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth.
– J.T.
Jan 29 at 23:03
That's more complicated and likely would need to be done with a real programming language such as Python. It may be possible withawk
but overall I can't think of a (clean) "unix" way to do that.
– Kristopher Ives
Jan 29 at 23:10
Hmm...then should I re-ask this in a Python forum, or would I still be able to ask here?
– J.T.
Jan 29 at 23:13
It's possible someone has a wizardly way of doing it that I'm not aware of, so I'm interested if anyone here can solve it. You might also want to post the exact question on Unix Stack Exchange as well as Stack Overflow
– Kristopher Ives
Jan 29 at 23:14
All right, I'll try crossposting. Thanks!
– J.T.
Jan 29 at 23:19
add a comment |
Use grep
which uses Regular Expressions:
# Find all lines where the fourth and seventh letter are "d" and "g"
grep '...d..g' somefile
# Find all lines where the fourth and ninth letters are "d" and "l"
grep '...d....l' somefile
If you want to enforce both rules, you would chain them together using a pipe:
grep '...d..g' somefile | grep '...d....l'
You can reduce the verbosity of a regex and multiple dots using the syntax {123}
instead of 123 dots, such as:
egrep '.{3}d.{2}g' somefile
Note that as your regular expression gets more complicated you may need to use the egrep
to support some syntax, such as the repetition syntax above.
Sorry, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return multiple words that don't fit the same criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth.
– J.T.
Jan 29 at 23:03
That's more complicated and likely would need to be done with a real programming language such as Python. It may be possible withawk
but overall I can't think of a (clean) "unix" way to do that.
– Kristopher Ives
Jan 29 at 23:10
Hmm...then should I re-ask this in a Python forum, or would I still be able to ask here?
– J.T.
Jan 29 at 23:13
It's possible someone has a wizardly way of doing it that I'm not aware of, so I'm interested if anyone here can solve it. You might also want to post the exact question on Unix Stack Exchange as well as Stack Overflow
– Kristopher Ives
Jan 29 at 23:14
All right, I'll try crossposting. Thanks!
– J.T.
Jan 29 at 23:19
add a comment |
Use grep
which uses Regular Expressions:
# Find all lines where the fourth and seventh letter are "d" and "g"
grep '...d..g' somefile
# Find all lines where the fourth and ninth letters are "d" and "l"
grep '...d....l' somefile
If you want to enforce both rules, you would chain them together using a pipe:
grep '...d..g' somefile | grep '...d....l'
You can reduce the verbosity of a regex and multiple dots using the syntax {123}
instead of 123 dots, such as:
egrep '.{3}d.{2}g' somefile
Note that as your regular expression gets more complicated you may need to use the egrep
to support some syntax, such as the repetition syntax above.
Use grep
which uses Regular Expressions:
# Find all lines where the fourth and seventh letter are "d" and "g"
grep '...d..g' somefile
# Find all lines where the fourth and ninth letters are "d" and "l"
grep '...d....l' somefile
If you want to enforce both rules, you would chain them together using a pipe:
grep '...d..g' somefile | grep '...d....l'
You can reduce the verbosity of a regex and multiple dots using the syntax {123}
instead of 123 dots, such as:
egrep '.{3}d.{2}g' somefile
Note that as your regular expression gets more complicated you may need to use the egrep
to support some syntax, such as the repetition syntax above.
answered Jan 29 at 22:57
Kristopher IvesKristopher Ives
2,93211525
2,93211525
Sorry, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return multiple words that don't fit the same criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth.
– J.T.
Jan 29 at 23:03
That's more complicated and likely would need to be done with a real programming language such as Python. It may be possible withawk
but overall I can't think of a (clean) "unix" way to do that.
– Kristopher Ives
Jan 29 at 23:10
Hmm...then should I re-ask this in a Python forum, or would I still be able to ask here?
– J.T.
Jan 29 at 23:13
It's possible someone has a wizardly way of doing it that I'm not aware of, so I'm interested if anyone here can solve it. You might also want to post the exact question on Unix Stack Exchange as well as Stack Overflow
– Kristopher Ives
Jan 29 at 23:14
All right, I'll try crossposting. Thanks!
– J.T.
Jan 29 at 23:19
add a comment |
Sorry, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return multiple words that don't fit the same criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth.
– J.T.
Jan 29 at 23:03
That's more complicated and likely would need to be done with a real programming language such as Python. It may be possible withawk
but overall I can't think of a (clean) "unix" way to do that.
– Kristopher Ives
Jan 29 at 23:10
Hmm...then should I re-ask this in a Python forum, or would I still be able to ask here?
– J.T.
Jan 29 at 23:13
It's possible someone has a wizardly way of doing it that I'm not aware of, so I'm interested if anyone here can solve it. You might also want to post the exact question on Unix Stack Exchange as well as Stack Overflow
– Kristopher Ives
Jan 29 at 23:14
All right, I'll try crossposting. Thanks!
– J.T.
Jan 29 at 23:19
Sorry, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return multiple words that don't fit the same criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth.
– J.T.
Jan 29 at 23:03
Sorry, I need the code to return any words that share the same characters in given positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return multiple words that don't fit the same criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth.
– J.T.
Jan 29 at 23:03
That's more complicated and likely would need to be done with a real programming language such as Python. It may be possible with
awk
but overall I can't think of a (clean) "unix" way to do that.– Kristopher Ives
Jan 29 at 23:10
That's more complicated and likely would need to be done with a real programming language such as Python. It may be possible with
awk
but overall I can't think of a (clean) "unix" way to do that.– Kristopher Ives
Jan 29 at 23:10
Hmm...then should I re-ask this in a Python forum, or would I still be able to ask here?
– J.T.
Jan 29 at 23:13
Hmm...then should I re-ask this in a Python forum, or would I still be able to ask here?
– J.T.
Jan 29 at 23:13
It's possible someone has a wizardly way of doing it that I'm not aware of, so I'm interested if anyone here can solve it. You might also want to post the exact question on Unix Stack Exchange as well as Stack Overflow
– Kristopher Ives
Jan 29 at 23:14
It's possible someone has a wizardly way of doing it that I'm not aware of, so I'm interested if anyone here can solve it. You might also want to post the exact question on Unix Stack Exchange as well as Stack Overflow
– Kristopher Ives
Jan 29 at 23:14
All right, I'll try crossposting. Thanks!
– J.T.
Jan 29 at 23:19
All right, I'll try crossposting. Thanks!
– J.T.
Jan 29 at 23:19
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1113945%2ffilter-different-identical-characters-in-multiple-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown