[Avg. reading time: 21 minutes]
Load CSV into Neo4J
APOC
APOC stands for "Awesome Procedures On Cypher". It is a popular library of user-defined procedures and functions for Neo4j, extending the functionality of the core database. APOC provides many utilities for data manipulation, graph algorithms, integration with external systems, and various helper functions that make working with Neo4j more powerful and convenient.
Key Features of APOC
Data Conversion:
Convert between different data types (e.g., dates, numbers, JSON). Parse and format dates easily.
Data Integration:
Import data from CSV, JSON, or external APIs. Export data to JSON, CSV, or other formats.
Graph Algorithms:
Perform advanced graph queries and algorithms not available in Cypher by default.
Utilities for Cypher Queries:
Run parallel queries, manage transactions, and work with lists, maps, or collections.
Improved Indexing and Search:
Help with text searches, schema indexing, and optimized lookups.
WITH Statement
The WITH clause in Neo4j serves multiple purposes. It allows you to chain parts of a query, pass results between stages, perform filtering and aggregation, and ensure that certain operations (like pagination or filtering) are done after aggregation.
Using WITH, you pass intermediate results from one stage to the next. Its like a pipeline manager.
Example
match(t) detach delete t;
CREATE (p1:Person {name: "Rachel"})
CREATE (p2:Person {name: "Ross"})
CREATE (p3:Person {name: "Chandler"})
CREATE (p4:Person {name: "Joey"})
CREATE (p5:Person {name: "Monica"})
CREATE (p6:Person {name: "Phoebe"})
CREATE (m1:Movie {title: "Inception", releaseDate: 2010, rating: 8.8})
CREATE (m2:Movie {title: "The Matrix", releaseDate: 1999, rating: 8.7})
CREATE (m3:Movie {title: "Interstellar", releaseDate: 2014, rating: 8.6})
CREATE (p1)-[:FRIEND]->(p2)
CREATE (p1)-[:FRIEND]->(p3)
CREATE (p2)-[:FRIEND]->(p4)
CREATE (p5)-[:FRIEND]->(p6)
CREATE (p1)-[:ACTED_IN]->(m1)
CREATE (p2)-[:ACTED_IN]->(m2)
CREATE (p3)-[:ACTED_IN]->(m3)
CREATE (p5)-[:ACTED_IN]->(m3)
CREATE (p1)-[:LIKES]->(m1)
CREATE (p2)-[:LIKES]->(m1)
CREATE (p3)-[:LIKES]->(m2)
CREATE (p4)-[:LIKES]->(m3)
CREATE (p4)-[:DISLIKES]->(m2)
CREATE (p6)-[:DISLIKES]->(m1)
View All
match(t)-[r]->(y) return t,r,y;
Count how many friends each person has and return only those with more than 1 friend.
MATCH (p:Person)-[:FRIEND]->(f:Person)
WITH p, count(f) AS friendCount
WHERE friendCount > 1
RETURN distinct p.name AS person, friendCount
Group people by the movie they both like
MATCH (m:Movie)<-[:LIKES]-(p:Person)
WITH m, collect(DISTINCT p.name) AS people
RETURN m.title AS movie, people
ORDER BY movie;
Convert String Date to Neo4J Date Format
WITH '10/30/2024' AS dateString
RETURN
apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate
Talk about Epoch
Also print the datatype
WITH '10/30/2024' AS dateString
RETURN
apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
apoc.meta.cypher.type(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy')) AS timestampType,
apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
apoc.meta.cypher.type(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS dataFormatType,
date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate,
apoc.meta.cypher.type(date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd'))) AS formattedDateType
Load CSV into Neo4J
match(t) detach delete t;
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,row.`Order ID` as OrderID
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,toInteger(row.`Order ID`) as OrderID
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN
row.`Order ID` AS orderId,
row.Country AS country,
row.Region AS region,
row.`Order Date` AS orderDate,
toInteger(row.UnitsSold) AS unitsSold,
toFloat(row.UnitPrice) AS unitPrice,
toFloat(row.TotalCost) AS totalCost,
toFloat(row.TotalProfit) AS totalProfit
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN
row.`Order ID` AS orderId,
row.Country AS country,
row.Region AS region,
date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS OrderDate,
toInteger(row.UnitsSold) AS unitsSold,
toFloat(row.UnitPrice) AS unitPrice,
toFloat(row.TotalCost) AS totalCost,
toFloat(row.TotalProfit) AS totalProfit
How to ingest this with Neo4J relations
-
One node per Order with properties for unitsSold, unitPrice, totalCost, totalProfit, OrderDate, country, region
- Fastest to ingest, lowest complexity
- weaker for joins, hierarchies, and reuse of country/region
-
Using top level Relations
- (Order)-[:IN_COUNTRY]->(Country), (Country)-[:IN_REGION]->(Region), (Order)-[:ON_DAY]->(Day)
- Best general-purpose pattern; supports drill-downs, reuse, and constraints
- Add uniqueness on orderId, country name, region name, date
-
Hierarchy-first model
- Region, Country; build the geography tree first (Country)-[:IN_REGION]->(Region)
- Attach Order nodes only to Country; roll-ups traverse Country→Region
Putting it all together
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
MERGE (region:Region {name: row.Region})
MERGE (country:Country {name: row.Country})
MERGE (itemType:ItemType {name: row.`Item Type`})
MERGE (sale:Sale {
salesChannel: toString(row.`Sales Channel`),
orderPriority: toString(row.`Order Priority`),
orderDate: date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
orderId: toInteger(row.`Order ID`),
shipDate: date(apoc.date.format(apoc.date.parse(row.`Ship Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
unitsSold: toInteger(row.UnitsSold),
unitPrice: toFloat(row.UnitPrice),
unitCost: toFloat(row.UnitCost),
totalRevenue: toFloat(row.TotalRevenue),
totalCost: toFloat(row.TotalCost),
totalProfit: toFloat(row.TotalProfit)
})
MERGE (region)-[:CONTAINS]->(country)
MERGE (country)-[:SOLD_ITEM]->(sale)
MERGE (sale)-[:OF_TYPE {order_priority: sale.orderPriority}]->(itemType)
Avg Profit per Region
MATCH (r:Region)-[:CONTAINS]->(:Country)-[:SOLD_ITEM]->(s:Sale)
WITH r, sum(s.totalProfit) AS profit, sum(s.totalRevenue) AS revenue
RETURN r.name AS region, round(profit / revenue * 100,2) AS marginPercent
ORDER BY marginPercent DESC;
Return number of units sold by Region, Country
match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, count(s.unitsSold) as cnt
Return number of units sold by Region, Country where count > 1
MATCH (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale)
WITH r.name AS regionName, c.name AS countryName, count(s) AS cnt
WHERE cnt > 1
RETURN regionName, countryName, cnt
Order By
match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, sum(s.unitsSold) as totalUnits order by 1,2,3