diff --git a/.gitignore b/.gitignore index 67451556fc6fdcbd87a449af788133e3a6036f87..6ab1839a0b5cdc4ab8a57f133e4b6ebde8a3eb38 100644 --- a/.gitignore +++ b/.gitignore @@ -131,4 +131,6 @@ dmypy.json # etc *.png *.jpg -*.pth \ No newline at end of file +*.pth + +./doc/* \ No newline at end of file diff --git a/README.md b/README.md index 9523661d34939b6a75a14180118f8692dd611d47..8cea2113ace0f5cf2ef67a0ba92be2b4e0938a28 100644 --- a/README.md +++ b/README.md @@ -1 +1,112 @@ -# SceneTextRemover-pytorch \ No newline at end of file +# Scene Text Remover Pytorch Implementation + +This is a minimal implementation of [Scene text removal via cascaded text storke detection and erasing](https://arxiv.org/pdf/2011.09768.pdf). This github repository is for studying on image in-painting for scene text erasing. Thank you :) + + + +## Requirements + +Python 3.7 or later with all [requirements.txt](./requirements.txt) dependencies installed, including `torch>=1.6`. To install run: + +``` +$ pip install -r requirements.txt +``` + + + +## Model Summary + + + + This model has u-net like sub modules. +`Gd` detects text stroke image `Ms` with `I` and `M`. `G'd` detects more precise text stroke `M's`. +Similarly, `Gr` generates text erased image `Ite`, and `G'r` generates more precise output `I'te`. + + + +## Custom Dictionary + +Not to be confused, I renamed the names. + +`I` : Input Image (with text) +`Mm` : Text area mask (`M` in the model) +`Ms` : Text stroke mask; output of `Gd` +`Ms_` : Text stroke mask; output of `G'd` +`Msgt` : Text stroke mask ; ground truth +`Ite` : Text erased image; output of `Gr` +`Ite_` : Text erased image; output of `G'r` +`Itegt`: Text erased image; ground truth + + + +## Prepare Dataset + +You need to prepare background images in `backs` directory and text binary images in `font_mask` directory. + + +[part of background image sample, text binary image sample] + +Executing `python create_dataset.py` will automatically generate `I`, `Itegt`, `Mm`, `Msgt` data. +(If you already have `I`, `Itegt`, `Mm`, `Msgt`, you can skip this section) + +``` +├─dataset +│ ├─backs +│ │ # background images +│ └─font_mask +│ │ # text binary images +│ └─train +│ │ └─I +│ │ └─Itegt +│ │ └─Mm +│ │ └─Msgt +│ └─val +│ └─I +│ └─Itegt +│ └─Mm +│ └─Msgt +``` + +I generated my dataset with 709 background images and 2410 font mask. +I used 17040 pairs for training and 4260 pairs for validation. + + + +Thanks for helping me gathering background images [sina-Kim]([sina-Kim (github.com)](https://github.com/sina-Kim)). + + + +## Train + +All you need to do is: + +``` python +python train.py +``` + + + +## Result + +From the left +`I`, `Itegt`, `Ite`, `Ite_`, `Msgt`, `Ms`, `Ms_` + +* Epoch 2 +  + +* Epoch 5 +  +* Epoch 10 +  +* Epoch 30 +  + +These are not good enough for real task. I think the reason is lack of dataset and simplicity. +But, it was a good experience for me to implement the paper. + + + +## Issue + +If you are having a trouble to run this code, please use issue tab. Thank you. + diff --git a/doc/back.png b/doc/back.png new file mode 100644 index 0000000000000000000000000000000000000000..e67ad3b01b752bdd22557d66a57d67f53080c9a9 Binary files /dev/null and b/doc/back.png differ diff --git a/doc/dataset_example.png b/doc/dataset_example.png new file mode 100644 index 0000000000000000000000000000000000000000..b3f461c48b468ec13f9467c5735c00e2278d20b8 Binary files /dev/null and b/doc/dataset_example.png differ diff --git a/doc/epoch1.PNG b/doc/epoch1.PNG new file mode 100644 index 0000000000000000000000000000000000000000..64fc635fcdee68f2a8b896aa8ddf2a9aa10c60ef Binary files /dev/null and b/doc/epoch1.PNG differ diff --git a/doc/epoch10.PNG b/doc/epoch10.PNG new file mode 100644 index 0000000000000000000000000000000000000000..06e7337ba3a1b88e576853256154d3aab0b7eba1 Binary files /dev/null and b/doc/epoch10.PNG differ diff --git a/doc/epoch30.PNG b/doc/epoch30.PNG new file mode 100644 index 0000000000000000000000000000000000000000..f83838178c004bd0a19046bdaf2c2657d585e6ae Binary files /dev/null and b/doc/epoch30.PNG differ diff --git a/doc/epoch5.PNG b/doc/epoch5.PNG new file mode 100644 index 0000000000000000000000000000000000000000..8e39f21181cd8a7d8e11c82fca2c0e6494e29f14 Binary files /dev/null and b/doc/epoch5.PNG differ diff --git a/doc/model.png b/doc/model.png new file mode 100644 index 0000000000000000000000000000000000000000..0bb6e61af1e9fbac376c710b4ee11159ebaa96d3 Binary files /dev/null and b/doc/model.png differ diff --git a/doc/text.png b/doc/text.png new file mode 100644 index 0000000000000000000000000000000000000000..1f61b487bc0a92c520969bc45a7d463a740a696e Binary files /dev/null and b/doc/text.png differ diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..389861cf23937387f4480973dcc9fe93e433287a --- /dev/null +++ b/requirements.txt @@ -0,0 +1,4 @@ +opencv-python +matplotlib +numpy +tqdm \ No newline at end of file diff --git a/results/show/Thumbs.db b/results/show/Thumbs.db new file mode 100644 index 0000000000000000000000000000000000000000..ed921fd65f0440cd9166d807ef7a769e648017ad Binary files /dev/null and b/results/show/Thumbs.db differ