This will be a fairly straightforward blog post covering different ways of copying data from a shortcut delta table into a delta table created automatically via a notebook.
Select statement with a join
%%sqlSELECTparty.recid AS PartyId,party.name AS Name,COALESCE(party.namealias, '') AS ShortName,COALESCE(postal.countryregionid, '') AS Country,COALESCE(postal.state, '') AS State,COALESCE(postal.city, '') AS City,COALESCE(postal.street, '') AS Street,COALESCE(postal.zipcode, '') AS PostCode,COALESCE(phone.locator, '') AS PhoneNumber,COALESCE(email.locator, '') AS EmailFROM dirpartytable partyLEFT OUTER JOIN logisticspostaladdress postal ON postal.location = party.primaryaddresslocationAND postal.validto > current_date() -- filters only valid(effective) addressesLEFT OUTER JOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphoneLEFT OUTER JOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail
You should see a table result showing below your query.
Create table if not exists
This is a one of copy and will not copy data if the table exists already.
%%sqlCREATE TABLE IF NOT EXISTS fact_dirpartytableUSING DELTA ASSELECTparty.recid AS PartyId,party.name AS Name,COALESCE(party.namealias, '') AS ShortName,COALESCE(postal.countryregionid, '') AS Country,COALESCE(postal.state, '') AS State,COALESCE(postal.city, '') AS City,COALESCE(postal.street, '') AS Street,COALESCE(postal.zipcode, '') AS PostCode,COALESCE(phone.locator, '') AS PhoneNumber,COALESCE(email.locator, '') AS EmailFROM dirpartytable partyLEFT OUTER JOIN logisticspostaladdress postal ON postal.location = party.primaryaddresslocationAND postal.validto > current_date() -- filters only valid(effective) addressesLEFT OUTER JOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphoneLEFT OUTER JOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail
Create table if not exists - use merge
This does a copy similar to the above but uses a merge to match the records
%%sql-- Step 1: Create Delta tableCREATE TABLE IF NOT EXISTS fact3_dirpartytable (PartyId LONG,Name STRING,ShortName STRING,Country STRING,State STRING,City STRING,Street STRING,PostCode STRING,PhoneNumber STRING,Email STRING) USING delta;-- Step 2: Create temporary viewCREATE OR REPLACE TEMPORARY VIEW temp_dirpartytable ASSELECTparty.recid AS PartyId,party.name AS Name,COALESCE(party.namealias, '') AS ShortName,COALESCE(postal.countryregionid, '') AS Country,COALESCE(postal.state, '') AS State,COALESCE(postal.city, '') AS City,COALESCE(postal.street, '') AS Street,COALESCE(postal.zipcode, '') AS PostCode,COALESCE(phone.locator, '') AS PhoneNumber,COALESCE(email.locator, '') AS EmailFROM dirpartytable partyLEFT OUTER JOIN logisticspostaladdress postal ON postal.location = party.primaryaddresslocationAND postal.validto > current_date() -- filters only valid(effective) addressesLEFT OUTER JOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphoneLEFT OUTER JOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail;-- Step 3: Merge into delta tableMERGE INTO fact3_dirpartytable AS targetUSING temp_dirpartytable AS sourceON target.PartyId = source.PartyIdWHEN MATCHED THENUPDATE SET target.Name = source.Name, target.ShortName = source.ShortNameWHEN NOT MATCHED THENINSERT (PartyId, Name, ShortName) VALUES (source.PartyId, source.Name, source.ShortName);
This will do an update, insert but will not handle deletes.
Create table, Delete and Insert data
This creates the table, then deletes the data in full and inserts it all again.
%%sql-- Step 1: Create Delta tableCREATE TABLE IF NOT EXISTS fact4_dirpartytable (PartyId LONG,Name STRING,ShortName STRING,Country STRING,State STRING,City STRING,Street STRING,PostCode STRING,PhoneNumber STRING,Email STRING) USING delta;-- Step 2: Delete data from the Delta tableDELETE FROM fact4_dirpartytable;-- Step 3: Create temporary viewINSERT INTO fact4_dirpartytableSELECTparty.recid AS PartyId,party.name AS Name,COALESCE(party.namealias, '') AS ShortName,COALESCE(postal.countryregionid, '') AS Country,COALESCE(postal.state, '') AS State,COALESCE(postal.city, '') AS City,COALESCE(postal.street, '') AS Street,COALESCE(postal.zipcode, '') AS PostCode,COALESCE(phone.locator, '') AS PhoneNumber,COALESCE(email.locator, '') AS EmailFROM dirpartytable partyLEFT OUTER JOIN logisticspostaladdress postal ON postal.location = party.primaryaddresslocationAND postal.validto > current_date() -- filters only valid(effective) addressesLEFT OUTER JOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphoneLEFT OUTER JOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail;
This runs fast but however not what you may want to do on a regular basis.
Create a temporary view and use SinkModifiedOn
Create a temporary view within your notebook to use as part of complex queries.
This query joins 4 tables together and each table has its own SinkModifiedOn field. I wanted to create a view that gave me the greatest (max) SingModifiedOn date time. This is to later allow me to do an incremental update.
CREATE OR REPLACE TEMPORARY VIEW temp_dirpartytable ASSELECTparty.SinkModifiedOn AS party_SinkModifiedOn,postal.SinkModifiedOn AS postal_SinkModifiedOn,phone.SinkModifiedOn AS phone_SinkModifiedOn,email.SinkModifiedOn AS email_SinkModifiedOn,GREATEST(party.SinkModifiedOn, postal.SinkModifiedOn, phone.SinkModifiedOn, email.SinkModifiedOn) AS SinkModifiedOn,party.recid AS PartyId,party.name AS Name,COALESCE(party.namealias, '') AS SearchName,COALESCE(postal.countryregionid, '') AS Country,COALESCE(postal.state, '') AS State,COALESCE(postal.city, '') AS City,COALESCE(postal.street, '') AS Street,COALESCE(postal.zipcode, '') AS PostCode,COALESCE(phone.locator, '') AS PhoneNumber,COALESCE(email.locator, '') AS EmailFROM dirpartytable partyLEFT OUTER JOIN logisticspostaladdress postal ON postal.location = party.primaryaddresslocationAND postal.validto > current_date() -- filters only valid(effective) addressesLEFT OUTER JOIN logisticselectronicaddress phone ON phone.recid = party.primarycontactphoneLEFT OUTER JOIN logisticselectronicaddress email ON email.recid = party.primarycontactemail;SELECT * FROM temp_dirpartytableWHERE SinkModifiedOn >= '2024-09-03T02:39:16Z';
This would be good for a transactional table where there are no deletes. You get the last SinkModifiedOn date time field for all the related table. Then filter based on the last run you have in your destination table. You could then do incremental updates.