[Avg. reading time: 21 minutes]

Load CSV into Neo4J

APOC

APOC stands for "Awesome Procedures On Cypher". It is a popular library of user-defined procedures and functions for Neo4j, extending the functionality of the core database. APOC provides many utilities for data manipulation, graph algorithms, integration with external systems, and various helper functions that make working with Neo4j more powerful and convenient.

Key Features of APOC

Data Conversion:

Convert between different data types (e.g., dates, numbers, JSON). Parse and format dates easily.

Data Integration:

Import data from CSV, JSON, or external APIs. Export data to JSON, CSV, or other formats.

Graph Algorithms:

Perform advanced graph queries and algorithms not available in Cypher by default.

Utilities for Cypher Queries:

Run parallel queries, manage transactions, and work with lists, maps, or collections.

Improved Indexing and Search:

Help with text searches, schema indexing, and optimized lookups.

WITH Statement

The WITH clause in Neo4j serves multiple purposes. It allows you to chain parts of a query, pass results between stages, perform filtering and aggregation, and ensure that certain operations (like pagination or filtering) are done after aggregation.

Using WITH, you pass intermediate results from one stage to the next. Its like a pipeline manager.

Example


match(t) detach delete t;

CREATE (p1:Person {name: "Rachel"})
CREATE (p2:Person {name: "Ross"})
CREATE (p3:Person {name: "Chandler"})
CREATE (p4:Person {name: "Joey"})
CREATE (p5:Person {name: "Monica"})
CREATE (p6:Person {name: "Phoebe"})

CREATE (m1:Movie {title: "Inception", releaseDate: 2010, rating: 8.8})
CREATE (m2:Movie {title: "The Matrix", releaseDate: 1999, rating: 8.7})
CREATE (m3:Movie {title: "Interstellar", releaseDate: 2014, rating: 8.6})

CREATE (p1)-[:FRIEND]->(p2)
CREATE (p1)-[:FRIEND]->(p3)
CREATE (p2)-[:FRIEND]->(p4)
CREATE (p5)-[:FRIEND]->(p6)

CREATE (p1)-[:ACTED_IN]->(m1)
CREATE (p2)-[:ACTED_IN]->(m2)
CREATE (p3)-[:ACTED_IN]->(m3)
CREATE (p5)-[:ACTED_IN]->(m3)

CREATE (p1)-[:LIKES]->(m1)
CREATE (p2)-[:LIKES]->(m1)
CREATE (p3)-[:LIKES]->(m2)
CREATE (p4)-[:LIKES]->(m3)

CREATE (p4)-[:DISLIKES]->(m2)
CREATE (p6)-[:DISLIKES]->(m1)

View All

match(t)-[r]->(y) return t,r,y;

Count how many friends each person has and return only those with more than 1 friend.

MATCH (p:Person)-[:FRIEND]->(f:Person)
WITH p, count(f) AS friendCount
WHERE friendCount > 1
RETURN distinct p.name AS person, friendCount

Group people by the movie they both like

MATCH (m:Movie)<-[:LIKES]-(p:Person)
WITH m, collect(DISTINCT p.name) AS people
RETURN m.title AS movie, people
ORDER BY movie;

Convert String Date to Neo4J Date Format

WITH '10/30/2024' AS dateString
RETURN 
apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate

Talk about Epoch

Also print the datatype

WITH '10/30/2024' AS dateString
RETURN 
    apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
    apoc.meta.cypher.type(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy')) AS timestampType,
    
    apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
    apoc.meta.cypher.type(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS dataFormatType,
    
    date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate,

    apoc.meta.cypher.type(date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd'))) AS formattedDateType

Load CSV into Neo4J

match(t) detach delete t;

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,row.`Order ID` as OrderID


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,toInteger(row.`Order ID`) as OrderID


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN 
    row.`Order ID` AS orderId, 
    row.Country AS country, 
    row.Region AS region, 
    row.`Order Date` AS orderDate, 
    toInteger(row.UnitsSold) AS unitsSold,
    toFloat(row.UnitPrice) AS unitPrice, 
    toFloat(row.TotalCost) AS totalCost, 
    toFloat(row.TotalProfit) AS totalProfit


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN 
    row.`Order ID` AS orderId, 
    row.Country AS country, 
    row.Region AS region, 
    date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS OrderDate, 
    toInteger(row.UnitsSold) AS unitsSold,
    toFloat(row.UnitPrice) AS unitPrice, 
    toFloat(row.TotalCost) AS totalCost, 
    toFloat(row.TotalProfit) AS totalProfit

How to ingest this with Neo4J relations

  • One node per Order with properties for unitsSold, unitPrice, totalCost, totalProfit, OrderDate, country, region

    • Fastest to ingest, lowest complexity
    • weaker for joins, hierarchies, and reuse of country/region
  • Using top level Relations

    • (Order)-[:IN_COUNTRY]->(Country), (Country)-[:IN_REGION]->(Region), (Order)-[:ON_DAY]->(Day)
    • Best general-purpose pattern; supports drill-downs, reuse, and constraints
    • Add uniqueness on orderId, country name, region name, date
  • Hierarchy-first model

    • Region, Country; build the geography tree first (Country)-[:IN_REGION]->(Region)
    • Attach Order nodes only to Country; roll-ups traverse Country→Region

Putting it all together

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row

MERGE (region:Region {name: row.Region})
MERGE (country:Country {name: row.Country})
MERGE (itemType:ItemType {name: row.`Item Type`})

MERGE (sale:Sale {
    salesChannel: toString(row.`Sales Channel`),
    orderPriority: toString(row.`Order Priority`),
    orderDate: date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
    orderId: toInteger(row.`Order ID`),
    shipDate: date(apoc.date.format(apoc.date.parse(row.`Ship Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
    unitsSold: toInteger(row.UnitsSold),
    unitPrice: toFloat(row.UnitPrice),
    unitCost: toFloat(row.UnitCost),
    totalRevenue: toFloat(row.TotalRevenue),
    totalCost: toFloat(row.TotalCost),
    totalProfit: toFloat(row.TotalProfit)
})

MERGE (region)-[:CONTAINS]->(country)
MERGE (country)-[:SOLD_ITEM]->(sale)
MERGE (sale)-[:OF_TYPE {order_priority: sale.orderPriority}]->(itemType)

Avg Profit per Region

MATCH (r:Region)-[:CONTAINS]->(:Country)-[:SOLD_ITEM]->(s:Sale)
WITH r, sum(s.totalProfit) AS profit, sum(s.totalRevenue) AS revenue
RETURN r.name AS region, round(profit / revenue * 100,2) AS marginPercent
ORDER BY marginPercent DESC;

Return number of units sold by Region, Country

match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, count(s.unitsSold) as cnt

Return number of units sold by Region, Country where count > 1

MATCH (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale)
WITH r.name AS regionName, c.name AS countryName, count(s) AS cnt
WHERE cnt > 1
RETURN regionName, countryName, cnt

Order By

match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, sum(s.unitsSold) as totalUnits order by 1,2,3

#neo4j #csv #salesdataVer 5.5.9

Last change: 2025-12-03