17 August 2022

Week nine | PCA, visualization, and automatic data extraction

by Yoshitaka Inoue

GSoC Project

Mentor: Augustin Luna (@cannin)

Tasks

Re-train DrugCell model.
Done HP tuning for small data, without the label of Other Score is 0.5
weight visualization Working on calculating importance of each hidden layers
Train Benchmark AutoML Model

Tuning by autokeras. Score was 0.22.
Understand RLIPP and implement (or use) it for describing the result with biological background

RLIPP takes 30 months. So, I won’t use original one and implemented something.
Document Project Dependencies

Working on this one right now. I organized some files for Pipfile and r-requirement. Also set github actions for data extraction.

Comments

Using classes' label, I removed others data and use it for autokeras and HPTuning.  
This reduce the time to run. 
Also, I tried to calculate the importance of each GO term. 
I talked with the author of RLIPP and the importance calculation is highly effected by the number of cell lines for each drugs.
In this case, our data is too small for that and got the same score for all GO terms.
To prevent this, running new model with large the number of hidden layers.
I hope to show more good results and get some explanation for that.

Next Step

Explain by biological phenomena.
Documentation

tags: gsoc