[Avg. reading time: 17 minutes]
Load CSV into Neo4J
APOC
APOC stands for "Awesome Procedures On Cypher". It is a popular library of user-defined procedures and functions for Neo4j, extending the functionality of the core database. APOC provides many utilities for data manipulation, graph algorithms, integration with external systems, and various helper functions that make working with Neo4j more powerful and convenient.
Key Features of APOC
Data Conversion:
Convert between different data types (e.g., dates, numbers, JSON). Parse and format dates easily.
Data Integration:
Import data from CSV, JSON, or external APIs. Export data to JSON, CSV, or other formats.
Graph Algorithms:
Perform advanced graph queries and algorithms not available in Cypher by default.
Utilities for Cypher Queries:
Run parallel queries, manage transactions, and work with lists, maps, or collections.
Improved Indexing and Search:
Help with text searches, schema indexing, and optimized lookups.
WITH Statement
The WITH clause in Neo4j serves multiple purposes. It allows you to chain parts of a query, pass results between stages, perform filtering and aggregation, and ensure that certain operations (like pagination or filtering) are done after aggregation.
Using WITH, you pass intermediate results from one stage to the next.
Example
match(t) detach delete t
CREATE (p1:Person {name: "Rachel"})
CREATE (p2:Person {name: "Ross"})
CREATE (p3:Person {name: "Chandler"})
CREATE (p4:Person {name: "Joey"})
CREATE (p5:Person {name: "Monica"})
CREATE (p6:Person {name: "Phoebe"})
CREATE (m1:Movie {title: "Inception", releaseDate: 2010, rating: 8.8})
CREATE (m2:Movie {title: "The Matrix", releaseDate: 1999, rating: 8.7})
CREATE (m3:Movie {title: "Interstellar", releaseDate: 2014, rating: 8.6})
CREATE (p1)-[:FRIEND]->(p2)
CREATE (p1)-[:FRIEND]->(p3)
CREATE (p2)-[:FRIEND]->(p4)
CREATE (p5)-[:FRIEND]->(p6)
CREATE (p1)-[:ACTED_IN]->(m1)
CREATE (p2)-[:ACTED_IN]->(m2)
CREATE (p3)-[:ACTED_IN]->(m3)
CREATE (p5)-[:ACTED_IN]->(m3)
CREATE (p1)-[:LIKES]->(m1)
CREATE (p2)-[:LIKES]->(m1)
CREATE (p3)-[:LIKES]->(m2)
CREATE (p4)-[:LIKES]->(m3)
CREATE (p4)-[:DISLIKES]->(m2)
CREATE (p6)-[:DISLIKES]->(m1)
Count how many friends each person has and return only those with more than 1 friend.
MATCH (p:Person)-[:FRIEND]->(f:Person)
WITH p, count(f) AS friendCount
WHERE friendCount > 1
RETURN distinct p.name AS person, friendCount
Convert String Date to Neo4J Date Format
WITH '10/30/2024' AS dateString
RETURN apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate
Also print the datatype
WITH '10/30/2024' AS dateString
RETURN
apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
apoc.meta.cypher.type(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy')) AS timestampType,
apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
apoc.meta.cypher.type(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS dataFormatType,
date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate,
apoc.meta.cypher.type(date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd'))) AS formattedDateType
Load CSV into Neo4J
match(t) detach delete t
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,row.`Order ID` as OrderID
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,toInteger(row.`Order ID`) as OrderID
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN
row.`Order ID` AS orderId,
row.Country AS country,
row.Region AS region,
row.`Order Date` AS orderDate,
toInteger(row.UnitsSold) AS unitsSold,
toFloat(row.UnitPrice) AS unitPrice,
toFloat(row.TotalCost) AS totalCost,
toFloat(row.TotalProfit) AS totalProfit
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN
row.`Order ID` AS orderId,
row.Country AS country,
row.Region AS region,
date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS OrderDate,
toInteger(row.UnitsSold) AS unitsSold,
toFloat(row.UnitPrice) AS unitPrice,
toFloat(row.TotalCost) AS totalCost,
toFloat(row.TotalProfit) AS totalProfit
Putting it all together
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
MERGE (region:Region {name: row.Region})
MERGE (country:Country {name: row.Country})
MERGE (itemType:ItemType {name: row.`Item Type`})
MERGE (sale:Sale {
salesChannel: toString(row.`Sales Channel`),
orderPriority: toString(row.`Order Priority`),
orderDate: date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
orderId: toInteger(row.`Order ID`),
shipDate: date(apoc.date.format(apoc.date.parse(row.`Ship Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
unitsSold: toInteger(row.UnitsSold),
unitPrice: toFloat(row.UnitPrice),
unitCost: toFloat(row.UnitCost),
totalRevenue: toFloat(row.TotalRevenue),
totalCost: toFloat(row.TotalCost),
totalProfit: toFloat(row.TotalProfit)
})
MERGE (region)-[:CONTAINS]->(country)
MERGE (country)-[:SOLD_ITEM]->(sale)
MERGE (sale)-[:OF_TYPE {order_priority: sale.orderPriority}]->(itemType)
Return number of units sold by Region, Country
match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, count(s.unitsSold) as cnt
Return number of units sold by Region, Country where count > 1
MATCH (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale)
WITH r.name AS regionName, c.name AS countryName, count(s) AS cnt
WHERE cnt > 1
RETURN regionName, countryName, cnt
Order By
match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, sum(s.unitsSold) as totalUnits order by 1,2,3