[Avg. reading time: 7 minutes]

Variety

Variety refers to the different types, formats, and sources of data collected — one of the 5 Vs of Big Data.

Types of Data : By Source

  • Social Media: YouTube, Facebook, LinkedIn, Twitter, Instagram
  • IoT Devices: Sensors, Cameras, Smart Meters, Wearables
  • Finance/Markets: Stock Market, Cryptocurrency, Financial APIs
  • Smart Systems: Smart Cars, Smart TVs, Home Automation
  • Enterprise Systems: ERP, CRM, SCM Logs
  • Public Data: Government Open Data, Weather Stations

Types of Data : By Data format

  • Structured Data – Organized in rows and columns (e.g., CSV, Excel, RDBMS)
  • Semi-Structured Data – Self-describing but irregular (e.g., JSON, XML, Avro, YAML)
  • Unstructured Data – No fixed schema (e.g., images, audio, video, emails)
  • Binary Data – Encoded, compressed, or serialized data (e.g., Parquet, Protocol Buffers, images, MP3)

Generally unstructured data files are stored in binary format, Example: Images, Video, Audio

But not all binary files contain unstructured data. Example: Parquet, Executable.

Structured Data

Tabular data from databases, spreadsheets.

Example:

  • Relational Table
  • Excel
IDNameJoin Date
101Rachel Green2020-05-01
201Joey Tribianni1998-07-05
301Monica Geller1999-12-14
401Cosmo Kramer2001-06-05

Semi-Structred Data

Data with tags or markers but not strictly tabular.

JSON

[
   {
      "id":1,
      "name":"Rachel Green",
      "gender":"F",
      "series":"Friends"
   },
   {
      "id":"2",
      "name":"Sheldon Cooper",
      "gender":"M",
      "series":"BBT"
   }
]

XML

<?xml version="1.0" encoding="UTF-8"?>
<actors>
   <actor>
      <id>1</id>
      <name>Rachel Green</name>
      <gender>F</gender>
      <series>Friends</series>
   </actor>

   <actor>
      <id>2</id>
      <name>Sheldon Cooper</name>
      <gender>M</gender>
      <series>BBT</series>
   </actor>
</actors>

Unstructured Data

Media files, free text, documents, logs – no predefined structure.

Rachel Green acted in Friends series. Her role is very popular. 
Similarly Sheldon Cooper acted in BBT. He acted as nerd physicist.

Types:

  • Images (JPG, PNG)
  • Video (MP4, AVI)
  • Audio (MP3, WAV)
  • Documents (PDF, DOCX)
  • Emails
  • Logs (system logs, server logs)
  • Web scraping content (HTML, raw text)

Note: Now we have lot of LLM (AI tools) that helps us parse Unstructured Data into tabular data quickly.

#structured #unstructured #semistructured #binary #json #xml #image #bigdata #bigvVer 5.5.3

Last change: 2025-10-15