If you want to analyse the housing market in England and Wales you can use the Prices Paid data from https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
Once you have the big file downloaded you need to upload it to your database. Put the file into the shared folder where you have allowed the postgres user and your current user to have access to the files. Mine is a folder in root called Database_Files
. CD into the data directory and check out the file to see if it contains column names:
head pp-complete.txt
A cursory inspection reveals no headers.
Check out https://www.gov.uk/guidance/about-the-price-paid-data#explanations-of-column-headers-in-the-ppd for the headers
You’ll have to create the table before hand.
CREATE DATABASE prices_paid;
CREATE TABLE prices_paid.public.price_paid(
Transaction_unique_identifier CHAR(250) PRIMARY KEY NOT NULL,
PRICE INT NOT NULL,
Date_of_transfer timestamp NOT NULL,
postcode CHAR(50),
Property_type CHAR(1),
old_new CHAR(1),
duration CHAR(1),
paon text,
saon text,
street text,
locality text,
town_city text,
district text,
county text,
ppd_category char(1),
record_status char(1)
);
I use DBeaver COMMUNITY for a nice SSMS like experience. Log into your database using a client and run the following SQL
COPY prices_paid.public.price_paid
FROM '/Database_Files/pp-complete.csv'
quote '"'
CSV
;
On my laptop it took 42 minutes. Which I think is too long. I then indexed the postcode field as that’s the one I would probably use most.
create index idxpostcode on prices_paid.public.price_paid (postcode);