Background - PNW Tree Identifier

The Big Picture

The PNW Tree Identifier uses a convolutional neural network (CNN) to classify photos of trees into one of 40 species native to the Pacific Northwest. It was trained entirely on community-contributed, research-grade observations from iNaturalist.

Data: Crowdsourced from Nature Enthusiasts

What is iNaturalist?

iNaturalist is a citizen science platform where people around the world upload photos of organisms they encounter in nature. Other users help identify them, and once enough people agree on a species, the observation reaches "research grade": a reliable, community-verified label.

This makes iNaturalist an exceptional source of labeled training data. Each photo comes with a verified species ID, GPS coordinates, a date, and context about the observation. Best of all, the platform represents how trees actually appear "in the wild," with all the natural variation of lighting, angles, seasons, and camera quality.

How the Data Were Collected

For each of the 40 target species, we queried the iNaturalist API (/v1/observations) with these filters:

Parameter	Value	Why
Taxon ID	Species-specific (e.g., 48256 for Douglas Fir)	Precise taxonomic targeting
Place IDs	Oregon (10), Washington (46), British Columbia (7085)	Focus on PNW populations
Quality grade	`research`	Only community-verified IDs
Target count	~400 images per species	Balance across classes

Photos were extracted from observations and upgraded from thumbnail to medium resolution (~500px). The pipeline downloads with a 1-second rate limit to respect iNaturalist's API guidelines, and supports resuming if interrupted.

The result: ~15,500 images total spanning all 40 species, representing a rich variety of bark textures, leaf shapes, growth stages, seasons, and photographic conditions.

Preprocessing Pipeline

Before training, every image passes through several quality checks:

Validation: corrupt or un-openable images are discarded
Resize: longest edge scaled to 384px (preserving aspect ratio)
Deduplication: SHA256 hashes catch exact-duplicate images
RGB conversion: all images standardized to 3-channel RGB, JPEG quality 95
Stratified split: 70% training / 15% validation / 15% test, maintaining class proportions

~15,500

Total images

~400

Per species

3

Regions (OR, WA, BC)

100%

Research grade

Model: EfficientNetV2-S

Why This Architecture?

EfficientNetV2 is a family of image classifiers designed by Google using neural architecture search – an automated process that explores thousands of possible network designs to find the most efficient ones. The "S" (small) variant has 21.5 million parameters: powerful enough for fine-grained species classification, but small enough to run on a laptop.

Architecture Overview

The network has two main parts:

Backbone (Feature Extractor)

EfficientNetV2-S

21.5M params · ImageNet pretrained

Extracts visual features from the image using layers of convolutions. Pretrained on 1.28M ImageNet images, it already "knows" textures, edges, shapes, and patterns.

1280-dimensional feature vector

Classifier Head (Custom)

Dropout(0.3) Linear(1280 → 512) ReLU + BatchNorm Dropout(0.15) Linear(512 → 40)

Maps the visual features to 40 tree species. Dropout layers prevent overfitting. BatchNorm stabilizes training.

Softmax → species probabilities

What is Transfer Learning?

Training a deep neural network from scratch requires millions of images and significant compute. Transfer learning is a shortcut: we start with a model already trained on a large, general dataset (ImageNet) and adapt it for our specific task (PNW trees).

ImageNet training teaches the model to "see," recognizing edges, textures, shapes, and spatial relationships. Our custom classification head teaches it to use those skills specifically for tree identification.

Training: Two Phases

Training happens in two distinct phases, each with a different strategy:

Phase 1 Warm Up the Head

The backbone is frozen - its weights don't change. Only the new classifier head trains, learning to map ImageNet features to our 40 species.

Epochs: 5

Learning rate: 1e-3

What trains: Classifier head only

Phase 2 Fine-Tune Everything

Now the entire network is unfrozen and trains together, but with differential learning rates -- the backbone gets gentle updates while the head keeps adapting more aggressively.

Epochs: Up to 30 (early stopping)

Backbone LR: 1e-5 (gentle)

Head LR: 1e-4 (10x higher)

Training Techniques

Technique	What It Does	Why It Helps
Label smoothing (0.1)	Softens the target from "100% this species" to "90% this, 0.26% each other"	Prevents overconfident predictions; better calibrated scores
Cosine annealing LR	Learning rate follows a smooth cosine curve toward zero	Gradual convergence without abrupt LR drops
Early stopping	Halts training if validation loss doesn't improve for 7 epochs	Prevents overfitting and wasted compute
AdamW optimizer	Adam with decoupled weight decay (0.01)	Better generalization than standard Adam

Data Augmentation

During training, each image is randomly transformed to simulate the variety of real-world conditions. This effectively multiplies the dataset size and teaches the model to be robust:

Transform	Parameters	Simulates
Random Resize Crop	384px, scale 0.6 - 1.0	Different distances and framings
Horizontal Flip	p = 0.5	Trees look the same both ways
Color Jitter	Brightness/contrast/saturation ±0.3	Varying sunlight and seasons
Rotation	±15 degrees	Tilted camera angles
Random Erasing	p = 0.2, 2-15% of image	Occlusion by branches, signs, etc.

At inference time, augmentation is disabled. Images are deterministically resized to 422px and center-cropped to 384px for consistent, reproducible predictions.

How Predictions Work

When you upload a photo, here's what happens behind the scenes:

Image Preparation

Your photo is validated, resized to 422px on the long edge, then center-cropped to 384×384 pixels. Pixel values are normalized to match ImageNet statistics (the same preprocessing the model was trained with).

Forward Pass

The processed image tensor passes through all layers of EfficientNetV2-S. The backbone extracts a 1,280-dimensional feature vector; the classifier head maps this to 40 raw scores (logits), one per species.

Softmax & Ranking

The softmax function converts the 40 raw scores into probabilities that sum to 1.0. The top-5 species (by probability) are returned as the prediction, each with a confidence percentage.

Cleanup

Your uploaded image is deleted from the server immediately after prediction. Nothing is stored.

Limitations & Tips

Geographic focus: The model was trained on trees from Oregon, Washington, and British Columbia. It may be less accurate for trees outside this region, or for planted/ornamental specimens.
40 species only: If the tree isn't one of the 40 recognized species, the model will still pick its best guess from those 40. Check the confidence score -- low confidence suggests the species may not be in the training set.
Photo quality matters: Clear, well-lit photos of distinctive features (bark, leaves, needles, cones) work best. Blurry or distant shots may produce less reliable results.
Not a replacement for field guides: This is a tool to help you narrow down possibilities. For confident identification, cross-reference with field guides, herbarium specimens, or local botanists.

Acknowledgments

iNaturalist and its global community of citizen scientists made this project possible. Every one of the ~15,500 training images was contributed and verified by real people exploring nature - hikers, botanists, students, and enthusiasts who took the time to photograph, upload, and identify the trees they encountered.

The pretrained backbone comes from the EfficientNetV2 paper by Tan & Le (2021), built on the ImageNet dataset. The web framework uses Flask, and the deep learning stack runs on PyTorch. Model and app development were assisted via Claude Code.