Discussion:
pg_dumpall and restore
Rossi, Maria
2018-10-09 17:18:37 UTC
Permalink
Hi,

I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?

Your help would be much appreciated.

Thanks.
Maria
Keith
2018-10-09 17:58:17 UTC
Permalink
Post by Rossi, Maria
Hi,
I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall
then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it
was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused
this and how to prevent?
Your help would be much appreciated.
Thanks.
Maria
Did you use the pg_dumpall binary from 10.5 to create the dump?

Keith
Rossi, Maria
2018-10-09 18:05:51 UTC
Permalink
Yes, I did use the V10.5 pg_dumpall

Maria A Rossi

From: Keith <***@keithf4.com>
Sent: Tuesday, October 9, 2018 1:58 PM
To: Rossi, Maria <***@jackson.com>
Cc: pgsql-***@lists.postgresql.org; pgsql-***@lists.postgresql.org
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL


On Tue, Oct 9, 2018 at 1:19 PM Rossi, Maria <***@jackson.com<mailto:***@jackson.com>> wrote:
Hi,

I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?

Your help would be much appreciated.

Thanks.
Maria




Did you use the pg_dumpall binary from 10.5 to create the dump?

Keith
Laurenz Albe
2018-10-09 18:28:25 UTC
Permalink
Post by Rossi, Maria
I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?
Your help would be much appreciated.
I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption,
but I suspect that one of the rows is not in the index you used
to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of
the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Tom Lane
2018-10-09 18:32:02 UTC
Permalink
Post by Laurenz Albe
Post by Rossi, Maria
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?
I don't believe that pg_dumpall miraculously duplicated the row.
What seems much more likely is that the dump script was loaded twice.
The script expects to be fed into an empty database ...

regards, tom lane
Rossi, Maria
2018-10-09 20:05:59 UTC
Permalink
The table has only 2 columns, name and value.
Select count(*) from table1 at the old database returned 115, on the new database, it returned 117. This a simple select without any WHERE clause.
Thanks.

Maria



-----Original Message-----
From: Laurenz Albe <***@cybertec.at>
Sent: Tuesday, October 9, 2018 2:28 PM
To: Rossi, Maria <***@jackson.com>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL
Post by Rossi, Maria
I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?
Your help would be much appreciated.
I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption, but I suspect that one of the rows is not in the index you used to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe
--
Cy
Stanton Schmidt
2018-10-10 11:51:30 UTC
Permalink
When this happened to me the only explanation I could find was that the original table/index had been corrupted.
The only way to "fix" things was to create a copy of the table and insert select distinct * from orig_table.

Hope this helps.

Stanton


From: "Rossi, Maria" <***@jackson.com>
To: "Laurenz Albe" <***@cybertec.at>, "pgsql-sql" <pgsql-***@lists.postgresql.org>, "pgsql-***@lists.postgresql.org" <pgsql-***@lists.postgresql.org>
Sent: Tuesday, October 9, 2018 3:05:59 PM
Subject: RE: pg_dumpall and restore

The table has only 2 columns, name and value.
Select count(*) from table1 at the old database returned 115, on the new database, it returned 117. This a simple select without any WHERE clause.
Thanks.

Maria



-----Original Message-----
From: Laurenz Albe <***@cybertec.at>
Sent: Tuesday, October 9, 2018 2:28 PM
To: Rossi, Maria <***@jackson.com>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL
I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?
Your help would be much appreciated.
I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption, but I suspect that one of the rows is not in the index you used to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe
--
Cybertec | https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cybertec-2Dpostgresql.com&d=DwIGaQ&c=eLkx2stMcShI0L6xvxICXHnFB9zmDvZmvnhsCd8Gf8M&r=EWbtHQpXIg2XbQduIylzyXrAFdbjWaMyy-p_LkyfjhQ&m=Oyq62lylPJc2GMn32y1LP8A7tG0QlTVCZtZ4jE4zwIc&s=wTd0yL8L_97S1EgPJeIdgkZwFsc_6lWGn7qd3XrBoNY&e=
Rossi, Maria
2018-10-10 12:01:55 UTC
Permalink
Thanks to all the responses. To fix:

1. At the source, pg_dump the 1 table only
2. Drop table at the target
3. Restore at the target

Also saw this: https://www.postgresql-archive.org/Duplicate-rows-during-pg-dump-td5871316.html
But I did not have to drop/recreate index at the source.
Thanks.

Maria A Rossi

From: Stanton Schmidt <***@rgllogistics.com>
Sent: Wednesday, October 10, 2018 7:52 AM
To: Rossi, Maria <***@jackson.com>
Cc: Laurenz Albe <***@cybertec.at>; pgsql-sql <pgsql-***@lists.postgresql.org>; pgsql-***@lists.postgresql.org
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL

When this happened to me the only explanation I could find was that the original table/index had been corrupted.
The only way to "fix" things was to create a copy of the table and insert select distinct * from orig_table.

Hope this helps.

Stanton

________________________________
From: "Rossi, Maria" <***@jackson.com<mailto:***@jackson.com>>
To: "Laurenz Albe" <***@cybertec.at<mailto:***@cybertec.at>>, "pgsql-sql" <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>, "pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>" <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>
Sent: Tuesday, October 9, 2018 3:05:59 PM
Subject: RE: pg_dumpall and restore

The table has only 2 columns, name and value.
Select count(*) from table1 at the old database returned 115, on the new database, it returned 117. This a simple select without any WHERE clause.
Thanks.

Maria



-----Original Message-----
From: Laurenz Albe <***@cybertec.at<mailto:***@cybertec.at>>
Sent: Tuesday, October 9, 2018 2:28 PM
To: Rossi, Maria <***@jackson.com<mailto:***@jackson.com>>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL
Post by Rossi, Maria
I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?
Your help would be much appreciated.
I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption, but I suspect that one of the rows is not in the index you used to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe
--
Cybertec | https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cybertec-2Dpostgresql.com&d=DwIGaQ&c=eLkx2stMcShI0L6xvxICXHnFB9zmDvZmvnhsCd8Gf8M&r=EWbtHQpXIg2XbQduIylzyXrAFdbjWaMyy-p_LkyfjhQ&m=Oyq62lylPJc2GMn32y1LP8A7tG0QlTVCZtZ4jE4zwIc&s=wTd0yL8L_97S1EgPJeIdgkZwFsc_6lWGn7qd3XrBoNY&e=
Loading...