The Global Biodiversity Information Facility (GBIF) is the world's largest open-access biodiversity data infrastructure, providing free access to hundreds of millions of species occurrence records from around the globe. For researchers in phylogenetics, biogeography, and conservation, GBIF represents an invaluable resource for understanding species distributions.
What is GBIF?
GBIF is an international network and research infrastructure funded by governments worldwide. Established in 2001, it aggregates biodiversity data from:
- Natural history museums: Specimen collections with detailed locality data
- Herbaria: Plant specimen records with geographic coordinates
- Citizen science platforms: iNaturalist, eBird, and other observation networks
- Research institutions: Field surveys and monitoring programs
- Government agencies: National biodiversity inventories
Types of Data in GBIF
Occurrence Records
The primary data type - records of a species being observed or collected at a specific location and time. Each record typically includes:
- Taxonomic information: Scientific name, higher taxonomy
- Geographic coordinates: Latitude and longitude
- Date: When the observation/collection occurred
- Basis of record: Preserved specimen, human observation, machine observation
- Data source: Institution and dataset
Species Checklists
Lists of species known from specific regions or taxonomic groups, useful for biodiversity assessments.
Sampling Events
Structured survey data that includes both presences and absences, enabling more sophisticated analyses.
Accessing GBIF Data
GBIF.org Portal
The easiest way to explore GBIF data is through the GBIF.org website, which offers:
- Species search with maps and statistics
- Occurrence search with filters
- Download functionality for large datasets
- API documentation for programmatic access
GBIF API
For automated data access, GBIF provides RESTful APIs:
# Search for species occurrences
GET https://api.gbif.org/v1/occurrence/search?
scientificName=Panthera%20tigris&
hasCoordinate=true&
limit=300
# Get species information
GET https://api.gbif.org/v1/species/match?
name=Panthera%20tigris
R Package (rgbif)
The rgbif package provides convenient R functions:
library(rgbif)
# Search for tiger occurrences
tigers <- occ_search(
scientificName = "Panthera tigris",
hasCoordinate = TRUE,
limit = 5000
)
# View results
head(tigers$data)
Data Quality Considerations
Not all GBIF records are equally reliable. Common quality issues include:
Coordinate Issues
- Coordinate precision: Some records have imprecise coordinates (e.g., country centroids)
- Transposed coordinates: Latitude and longitude swapped
- Zero coordinates: Records at 0,0 (often errors)
- Ocean records: Terrestrial species plotted in water
Taxonomic Issues
- Outdated names: Synonyms that haven't been updated
- Misidentifications: Incorrect species determinations
- Spelling errors: Typos in scientific names
Quality Filtering Best Practices
Always filter GBIF data before analysis. Use flags like hasCoordinate=true, coordinateUncertaintyInMeters<10000, and check for outliers that fall outside known species ranges.
GBIF Data Quality Flags
GBIF automatically flags potential issues in records. Key flags include:
COORDINATE_INVALID: Coordinates fail validationCOUNTRY_COORDINATE_MISMATCH: Coordinates don't match stated countryZERO_COORDINATE: Lat or lon equals zeroTAXON_MATCH_FUZZY: Name matched with uncertainty
Using GBIF Data for Research
Species Distribution Modeling
GBIF occurrence data is ideal for building SDMs using tools like MaxEnt, Bioclim, or ENMeval. Combine occurrences with environmental layers to predict suitable habitat.
Biogeographic Analysis
Map species distributions onto phylogenetic trees to infer ancestral ranges, detect dispersal events, and test biogeographic hypotheses.
Conservation Prioritization
Identify areas of high species richness, locate populations of threatened species, and assess habitat connectivity.
Climate Change Research
Use historical occurrence records to detect range shifts over time and project future distributions under climate scenarios.
Citing GBIF Data
When using GBIF data in publications, proper citation is essential:
- Cite the GBIF.org download DOI provided with your data
- Cite individual datasets when using specific collections
- Follow GBIF's citation guidelines for your specific use case
GBIF.org (09 March 2026) GBIF Occurrence Download https://doi.org/10.15468/dl.xxxxx
Search GBIF with PhyloVerse
Access GBIF occurrence data directly within PhyloVerse. Search by taxon name, visualize distributions on interactive maps, and integrate with your phylogenetic analyses.
Launch PhyloVerseBeyond GBIF: Other Data Sources
While GBIF is the largest aggregator, other valuable biodiversity data sources include:
- iNaturalist: Citizen science observations with photo verification
- eBird: Bird occurrence data from birders worldwide
- BOLD Systems: DNA barcode sequences with occurrence data
- OBIS: Ocean Biogeographic Information System for marine species
- VertNet: Vertebrate natural history collections
Conclusion
GBIF has transformed biodiversity research by making occurrence data freely available to anyone. Whether you're modeling species distributions, testing biogeographic hypotheses, or planning conservation actions, GBIF provides the foundational data you need. By understanding data quality issues and applying appropriate filters, you can leverage this remarkable resource for rigorous scientific research.