Humback Whale Identification
Some best soluton
rank | solution | github | author | keyword |
---|---|---|---|---|
1th | 1th Place Solution | Github code | earhian | classification |
3rd | 3rd Place Solution | Github | pudae | ArcFace |
4th | 4th Place Solution | Github code | David | SIFT+Siamese |
7th | 7th Place Solution | Github code | old-ufo | classification |
9th | 9th Place Solution | Github code | lvan Sosin | GapNet |
25th | 25th Place Solution | Github code | Bartek | CosFace+ProtoNets |
31st | 31st Place Solution | Github code | Khoi Nguyen | RGB |
57th | 57th Place Solution | Github code | Miguel Pinto | SoftTripletLoss |
My solution
Heavily based on Whale Recognition Model with score 0.78563
Training
- Framework:
Keras(backend: tensorflow)
- Model:
Siamese(CNN+Metric Learning)
- Augmentation:
slight(otation, shear, height_zoom, width_zoom, height_shift, width_shift)
- Preprocess:
rotate some special images, convert grayscale,get bounding boxs, affine tranformation
- Optimizer: Adam
- Learning rate:
start at 64e-5, and 4 times less training per epoch group
- Image size:
512*512
- Epochs:
400 or more
- Batch size:
32
Prediction
- Threshold:
0.99 and 0.94 with bootstrapping
- TTA number:
4
- TTA augmentaion:
random slight: (rotation, shear, height_zoom, width_zoom, height_shift, width_shift)
Result
- Training takes about more than 80 hours on GTX 1080TI without pretrained state-of-art model
- Public LB:
0.92248
- Private LB:
0.92761
Mode result ensemble:
- Ensemble of ensemble is not feasible, but ensemble is very effective
- If single model is selected as far as possible for fusion, the effect is better, but the model difference is large, so the fusion effect is better. The fusion effect of models with similar Epochs is not as good as that with large difference
- The ensemble of tta*4 + original result is effective
ensemble code
# coding:utf-8
# filename:ensemble.py
# function:模型识别结果融合程序,融合4个最好的结果
import csv
sub_files = [
'./submissions/submission_Simaese_Epochs220_multithreads_lapjv_512size_0.883.csv',
'./submissions/submission_Simaese_Epochs210_multithreads_lapjv_384size_0.884.csv',
'./submissions/submission_ensemble_(Epoch250_tta*4+original)_0.901.csv',
'./submissions/submission_Simaese_Epochs390_multithreads_lapjv_512size_0.905.csv',
'./submissions/submission_ensemble_(Boot_Epoch350_tta*4+original)_0.908.csv',
'./submissions/submission_ensemble_(Epoch400_tta*4+original)_0.912.csv']
print(len(sub_files))
# Weights of the individual subs
sub_weight = [
0.883 ** 2,
0.884 ** 2,
0.901 ** 2,
0.905 ** 2,
0.908 ** 2,
0.912 ** 2]
Hlabel = 'Image'
Htarget = 'Id'
npt = 5 # number of places in target
place_weights = {}
for i in range(npt):
place_weights[i] = (1 / (i + 1))
print(place_weights)
lg = len(sub_files)
sub = [None] * lg
for i, file in enumerate(sub_files):
## input files ##
print("Reading {}: w={} - {}".format(i, sub_weight[i], file))
reader = csv.DictReader(open(file, "r")) # 将csv文件数据读入到字典中
sub[i] = sorted(reader, key=lambda d: str(d[Hlabel]))
## output file ##
out = open("./submissions/submission_ensemble_zh.csv", "w", newline='')
writer = csv.writer(out)
writer.writerow([Hlabel, Htarget])
for p, row in enumerate(sub[0]):
target_weight = {}
for s in range(lg):
row1 = sub[s][p]
for ind, trgt in enumerate(row1[Htarget].split(' ')):
target_weight[trgt] = target_weight.get(trgt, 0) + (place_weights[ind] * sub_weight[s])
tops_trgt = sorted(target_weight, key=target_weight.get, reverse=True)[:npt]
writer.writerow([row1[Hlabel], " ".join(tops_trgt)])
out.close()
My conclusion
Work
- Large image size helps a lot
- ensemble is useful, but correct ensemble strategy is more useful
- TTA maybe help, but ensemble of tta must be help
- Put all images into SSD faster than HDD in training
- training more epochs helps a lot
- bootstrapping helps, but it need more time to train
Don't work
- pure classition don't work, but if you do some extra works,classition maybe very useful, such as this 1thsolution
- n-fold CV: my parteners have tried 5-fold CV, but it dont't work, maybe our ways have some problem, but i dont see n-fold CV as solution in Kaggle Dissussion
Uncertain
- Grayscale images are not necessarily more effective than RGB
Usage
Environments
Hardware requirements
- GTX1060, GTX1080TI better
- 32GB Memory
- SSD
Software requirments
- Ubuntu 18.04
- Anaconda3/Python3
- Keras(backend: tensorflow
Steps for usage
- 1.clone the repository
git https://github.com/HarleysZhang/kaggle_humpback_whale_identification.git
cd kaggle_humpback_whale_identification
- 2.install requirements
pip3 install -r requirements.txt
- 3.download data and copy it to data folder
kaggle competitions download -c humpback-whale-identification
cp train ./data/
cp test ./data/
cp train.csv ./data/
cp sample_submission.csv ./data/
- 4.train your model without bootstrapping
python3 main_all.py
with bootstrapping
python3 main_with_bootstrapping.py
- 5.ensemble submission file
python test.py
# python test_tta.py # with tta
Some Code Interpretation
Build a transformation matrix with the specified characteristics.
def build_transform(rotation, shear, height_zoom, width_zoom, height_shift, width_shift):
"""
Build a transformation matrix with the specified characteristics.
"""
rotation = np.deg2rad(rotation)
shear = np.deg2rad(shear)
rotation_matrix = np.array(
[[np.cos(rotation), np.sin(rotation), 0], [-np.sin(rotation), np.cos(rotation), 0], [0, 0, 1]])
shift_matrix = np.array(
[[1, 0, height_shift], [0, 1, width_shift], [0, 0, 1]])
shear_matrix = np.array(
[[1, np.sin(shear), 0], [0, np.cos(shear), 0], [0, 0, 1]])
zoom_matrix = np.array(
[[1.0 / height_zoom, 0, 0], [0, 1.0 / width_zoom, 0], [0, 0, 1]])
shift_matrix = np.array(
[[1, 0, -height_shift], [0, 1, -width_shift], [0, 0, 1]])
return np.dot(np.dot(rotation_matrix, shear_matrix), np.dot(zoom_matrix, shift_matrix))
Compute the score matrix by scoring every pictures from the training set against every other picture O(n^2) with multithreads.
def compute_score(verbose=1):
"""
Compute the score matrix by scoring every pictures from the training set against every other picture O(n^2).
"""
features = branch_model.predict_generator(
FeatureGen(train, batch_size=64, verbose=verbose),
max_queue_size=12, workers=6, verbose=0)
num_threads = 6
batch = features.shape[0] // (num_threads - 1)
if features.shape[0] % batch <= 3:
num_threads = 5
if features.shape[0] % batch is not 0:
batch += 1
all_score = []
for start in range(0, features.shape[0], batch):
end = min(features.shape[0], start + batch)
temp_features = features[start:end, :]
temp_score = head_model.predict_generator(
ScoreGen(temp_features, batch_size=2048, verbose=verbose),
max_queue_size=12, workers=6, verbose=0)
temp_score = score_reshape(temp_score, temp_features)
all_score.append(temp_score)
score = np.zeros((features.shape[0], features.shape[0]), dtype=K.floatx())
for i, start in enumerate(range(0, features.shape[0], batch)):
end = min(features.shape[0], start + batch)
score[start:end, start:end] = all_score[i]
return features, score
sompute Linear programming problem with multithreads
def my_lapjv(score):
num_threads = 6
batch = score.shape[0] // (num_threads - 1)
if score.shape[0] % batch <= 3:
num_threads = 5
if score.shape[0] % batch is not 0:
batch += 1
# print(batch)
tmp = num_threads * [None]
threads = []
thread_input = num_threads * [None]
thread_idx = 0
for start in range(0, score.shape[0], batch):
end = min(score.shape[0], start + batch)
# print('%d %d' % (start, end))
thread_input[thread_idx] = score[start:end, start:end]
thread_idx += 1
def worker(data_idx):
x, _, _ = lapjv(thread_input[data_idx])
tmp[data_idx] = x + data_idx * batch
# print("Start worker threads")
for i in range(num_threads):
t = threading.Thread(target=worker, args=(i,), daemon=True)
t.start()
threads.append(t)
for t in threads:
if t is not None:
t.join()
x = np.concatenate(tmp)
# print("LAP completed")
return x