[Avg. reading time: 17 minutes]

Load CSV into Neo4J

APOC

APOC stands for "Awesome Procedures On Cypher". It is a popular library of user-defined procedures and functions for Neo4j, extending the functionality of the core database. APOC provides many utilities for data manipulation, graph algorithms, integration with external systems, and various helper functions that make working with Neo4j more powerful and convenient.

Key Features of APOC

Data Conversion:

Convert between different data types (e.g., dates, numbers, JSON). Parse and format dates easily.

Data Integration:

Import data from CSV, JSON, or external APIs. Export data to JSON, CSV, or other formats.

Graph Algorithms:

Perform advanced graph queries and algorithms not available in Cypher by default.

Utilities for Cypher Queries:

Run parallel queries, manage transactions, and work with lists, maps, or collections.

Improved Indexing and Search:

Help with text searches, schema indexing, and optimized lookups.

WITH Statement

The WITH clause in Neo4j serves multiple purposes. It allows you to chain parts of a query, pass results between stages, perform filtering and aggregation, and ensure that certain operations (like pagination or filtering) are done after aggregation.

Using WITH, you pass intermediate results from one stage to the next.

Example


match(t) detach delete t

CREATE (p1:Person {name: "Rachel"})
CREATE (p2:Person {name: "Ross"})
CREATE (p3:Person {name: "Chandler"})
CREATE (p4:Person {name: "Joey"})
CREATE (p5:Person {name: "Monica"})
CREATE (p6:Person {name: "Phoebe"})

CREATE (m1:Movie {title: "Inception", releaseDate: 2010, rating: 8.8})
CREATE (m2:Movie {title: "The Matrix", releaseDate: 1999, rating: 8.7})
CREATE (m3:Movie {title: "Interstellar", releaseDate: 2014, rating: 8.6})

CREATE (p1)-[:FRIEND]->(p2)
CREATE (p1)-[:FRIEND]->(p3)
CREATE (p2)-[:FRIEND]->(p4)
CREATE (p5)-[:FRIEND]->(p6)

CREATE (p1)-[:ACTED_IN]->(m1)
CREATE (p2)-[:ACTED_IN]->(m2)
CREATE (p3)-[:ACTED_IN]->(m3)
CREATE (p5)-[:ACTED_IN]->(m3)

CREATE (p1)-[:LIKES]->(m1)
CREATE (p2)-[:LIKES]->(m1)
CREATE (p3)-[:LIKES]->(m2)
CREATE (p4)-[:LIKES]->(m3)

CREATE (p4)-[:DISLIKES]->(m2)
CREATE (p6)-[:DISLIKES]->(m1)

Count how many friends each person has and return only those with more than 1 friend.

MATCH (p:Person)-[:FRIEND]->(f:Person)
WITH p, count(f) AS friendCount
WHERE friendCount > 1
RETURN distinct p.name AS person, friendCount

Convert String Date to Neo4J Date Format

WITH '10/30/2024' AS dateString
RETURN apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate

Also print the datatype

WITH '10/30/2024' AS dateString
RETURN 
    apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy') AS timestamp,
    apoc.meta.cypher.type(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy')) AS timestampType,
    
    apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd') AS dataFormatInString,
    apoc.meta.cypher.type(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS dataFormatType,
    
    date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS formattedDate,

    apoc.meta.cypher.type(date(apoc.date.format(apoc.date.parse(dateString, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd'))) AS formattedDateType

Load CSV into Neo4J

match(t) detach delete t

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,row.`Order ID` as OrderID


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
return row.Region,row.Country,toInteger(row.`Order ID`) as OrderID


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN 
    row.`Order ID` AS orderId, 
    row.Country AS country, 
    row.Region AS region, 
    row.`Order Date` AS orderDate, 
    toInteger(row.UnitsSold) AS unitsSold,
    toFloat(row.UnitPrice) AS unitPrice, 
    toFloat(row.TotalCost) AS totalCost, 
    toFloat(row.TotalProfit) AS totalProfit


LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row
RETURN 
    row.`Order ID` AS orderId, 
    row.Country AS country, 
    row.Region AS region, 
    date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')) AS OrderDate, 
    toInteger(row.UnitsSold) AS unitsSold,
    toFloat(row.UnitPrice) AS unitPrice, 
    toFloat(row.TotalCost) AS totalCost, 
    toFloat(row.TotalProfit) AS totalProfit

Putting it all together

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/gchandra10/filestorage/main/sales_100.csv' AS row

MERGE (region:Region {name: row.Region})
MERGE (country:Country {name: row.Country})
MERGE (itemType:ItemType {name: row.`Item Type`})

MERGE (sale:Sale {
    salesChannel: toString(row.`Sales Channel`),
    orderPriority: toString(row.`Order Priority`),
    orderDate: date(apoc.date.format(apoc.date.parse(row.`Order Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
    orderId: toInteger(row.`Order ID`),
    shipDate: date(apoc.date.format(apoc.date.parse(row.`Ship Date`, 'ms', 'MM/dd/yyyy'), 'ms', 'yyyy-MM-dd')),
    unitsSold: toInteger(row.UnitsSold),
    unitPrice: toFloat(row.UnitPrice),
    unitCost: toFloat(row.UnitCost),
    totalRevenue: toFloat(row.TotalRevenue),
    totalCost: toFloat(row.TotalCost),
    totalProfit: toFloat(row.TotalProfit)
})

MERGE (region)-[:CONTAINS]->(country)
MERGE (country)-[:SOLD_ITEM]->(sale)
MERGE (sale)-[:OF_TYPE {order_priority: sale.orderPriority}]->(itemType)

Return number of units sold by Region, Country

match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, count(s.unitsSold) as cnt

Return number of units sold by Region, Country where count > 1

MATCH (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale)
WITH r.name AS regionName, c.name AS countryName, count(s) AS cnt
WHERE cnt > 1
RETURN regionName, countryName, cnt

Order By

match (r:Region)-[:CONTAINS]->(c:Country)-[:SOLD_ITEM]->(s:Sale) return r.name, c.name, sum(s.unitsSold) as totalUnits order by 1,2,3

#neo4j #csvVer 5.5.3

Last change: 2025-10-15