Identity column re-seed: when it is necessary?
up vote
9
down vote
favorite
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int
instead of bigint
or uniqueidentifier
when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
add a comment |
up vote
9
down vote
favorite
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int
instead of bigint
or uniqueidentifier
when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56
1
Database clusters, Galera and multiple active-write db servers, also useauto_increment_(increment|offset)
(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
Nov 25 at 23:06
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int
instead of bigint
or uniqueidentifier
when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int
instead of bigint
or uniqueidentifier
when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
mysql identity auto-increment
edited Nov 25 at 23:31
Rick James
40.1k22257
40.1k22257
asked Nov 25 at 22:32
Crypt32
1485
1485
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56
1
Database clusters, Galera and multiple active-write db servers, also useauto_increment_(increment|offset)
(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
Nov 25 at 23:06
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06
add a comment |
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56
1
Database clusters, Galera and multiple active-write db servers, also useauto_increment_(increment|offset)
(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
Nov 25 at 23:06
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06
2
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56
1
1
Database clusters, Galera and multiple active-write db servers, also use
auto_increment_(increment|offset)
(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.– danblack
Nov 25 at 23:06
Database clusters, Galera and multiple active-write db servers, also use
auto_increment_(increment|offset)
(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.– danblack
Nov 25 at 23:06
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06
add a comment |
4 Answers
4
active
oldest
votes
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY
for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1
.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE
, IODKU
, REPLACE
, DELETE
, ROLLBACK
(explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
Nov 25 at 23:12
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
Nov 25 at 23:28
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
Nov 25 at 23:30
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
Nov 26 at 11:42
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
Nov 26 at 22:17
|
show 10 more comments
up vote
7
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
Nov 26 at 4:33
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
Nov 26 at 5:35
add a comment |
up vote
5
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned
won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
Nov 25 at 23:01
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
Nov 25 at 23:03
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
Nov 25 at 23:09
add a comment |
up vote
0
down vote
I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.
Corruption of the PK index itself.
Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY
for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1
.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE
, IODKU
, REPLACE
, DELETE
, ROLLBACK
(explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
Nov 25 at 23:12
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
Nov 25 at 23:28
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
Nov 25 at 23:30
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
Nov 26 at 11:42
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
Nov 26 at 22:17
|
show 10 more comments
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY
for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1
.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE
, IODKU
, REPLACE
, DELETE
, ROLLBACK
(explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
Nov 25 at 23:12
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
Nov 25 at 23:28
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
Nov 25 at 23:30
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
Nov 26 at 11:42
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
Nov 26 at 22:17
|
show 10 more comments
up vote
13
down vote
accepted
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY
for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1
.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE
, IODKU
, REPLACE
, DELETE
, ROLLBACK
(explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY
for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1
.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE
, IODKU
, REPLACE
, DELETE
, ROLLBACK
(explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
edited Nov 26 at 4:12
answered Nov 25 at 23:09
Rick James
40.1k22257
40.1k22257
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
Nov 25 at 23:12
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
Nov 25 at 23:28
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
Nov 25 at 23:30
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
Nov 26 at 11:42
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
Nov 26 at 22:17
|
show 10 more comments
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
Nov 25 at 23:12
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
Nov 25 at 23:28
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
Nov 25 at 23:30
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
Nov 26 at 11:42
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
Nov 26 at 22:17
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
Nov 25 at 23:12
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
Nov 25 at 23:12
1
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
Nov 25 at 23:28
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
Nov 25 at 23:28
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
Nov 25 at 23:30
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
Nov 25 at 23:30
1
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
Nov 26 at 11:42
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
Nov 26 at 11:42
1
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
Nov 26 at 22:17
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
Nov 26 at 22:17
|
show 10 more comments
up vote
7
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
Nov 26 at 4:33
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
Nov 26 at 5:35
add a comment |
up vote
7
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
Nov 26 at 4:33
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
Nov 26 at 5:35
add a comment |
up vote
7
down vote
up vote
7
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
answered Nov 26 at 1:18
jmoreno
549416
549416
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
Nov 26 at 4:33
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
Nov 26 at 5:35
add a comment |
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
Nov 26 at 4:33
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
Nov 26 at 5:35
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
Nov 26 at 4:33
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
Nov 26 at 4:33
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
Nov 26 at 5:35
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
Nov 26 at 5:35
add a comment |
up vote
5
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned
won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
Nov 25 at 23:01
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
Nov 25 at 23:03
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
Nov 25 at 23:09
add a comment |
up vote
5
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned
won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
Nov 25 at 23:01
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
Nov 25 at 23:03
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
Nov 25 at 23:09
add a comment |
up vote
5
down vote
up vote
5
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned
won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned
won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
answered Nov 25 at 22:52
danblack
1,4511212
1,4511212
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
Nov 25 at 23:01
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
Nov 25 at 23:03
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
Nov 25 at 23:09
add a comment |
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
Nov 25 at 23:01
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
Nov 25 at 23:03
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
Nov 25 at 23:09
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
Nov 25 at 23:01
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
Nov 25 at 23:01
1
1
Rollback an insert is either failure or user instigated like:
BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
Nov 25 at 23:03
Rollback an insert is either failure or user instigated like:
BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
Nov 25 at 23:03
1
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
Nov 25 at 23:09
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
Nov 25 at 23:09
add a comment |
up vote
0
down vote
I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.
Corruption of the PK index itself.
Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.
add a comment |
up vote
0
down vote
I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.
Corruption of the PK index itself.
Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.
add a comment |
up vote
0
down vote
up vote
0
down vote
I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.
Corruption of the PK index itself.
Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.
I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.
Corruption of the PK index itself.
Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.
answered Nov 29 at 20:33
user1207758
3161210
3161210
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223392%2fidentity-column-re-seed-when-it-is-necessary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56
1
Database clusters, Galera and multiple active-write db servers, also use
auto_increment_(increment|offset)
(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.– danblack
Nov 25 at 23:06
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06