Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich georeferenced items with information from nominatim server #1487

Open
patrickdalla opened this issue Jan 23, 2023 · 3 comments · May be fixed by #1890
Open

Enrich georeferenced items with information from nominatim server #1487

patrickdalla opened this issue Jan 23, 2023 · 3 comments · May be fixed by #1890
Assignees

Comments

@patrickdalla
Copy link
Collaborator

patrickdalla commented Jan 23, 2023

Nominatim maps a geolocation to textual address information. It may be useful to search these items by their city name, for instance. Also, this information can be used to cluster the plotted items by their administrative address information (country, city, etc..).
An URL should be configured, and if so, the processing would lookup addresses for every geolocated item.

patrickdalla added a commit that referenced this issue Jul 10, 2023
nominatim
server for a related address of a georeferenced item.
@patrickdalla
Copy link
Collaborator Author

Hi nassif,

I've made a stub of this in Nominatim branch. But I have some doubts about the design.

First, the task makes one http request per item with geolocation. This can lead to many http requests. And this kind of request to complement information from other sources can become more common, made from other task also (maybe it already exists). There should be some centralized/normalized subsystem to control these kind of enrichment from http requests?

Second: This external information complement are what some tools call enrichment. Should we make some visual differentiation of enrichment information from original artifacts?

@lfcnassif
Copy link
Member

lfcnassif commented Jul 11, 2023

Sorry for my delay.

First, the task makes one http request per item with geolocation. This can lead to many http requests.

You can group items and make requests in batches, overriding the AbstractTask.sendToNextTask() method. Be careful to send all batched items to the next task after they are processed together, otherwise you can miss case items. We have this example:
https://github.com/sepinf-inc/IPED/blob/master/iped-app/resources/scripts/tasks/NSFWNudityDetectTask.py

And this kind of request to complement information from other sources can become more common, made from other task also (maybe it already exists).

We have the --downloadInternetData command line param to enable this behavior. Currently it is used just by WhatsAppParser to download media attachments still present on WhatsApp servers.

Second: This external information complement are what some tools call enrichment. Should we make some visual differentiation of enrichment information from original artifacts?

Yes, WhatsAppParser adds the ExtraProperties.DOWNLOADED_DATA property to flag downloaded medias and it also put a notice about that in the chat HTML report.

patrickdalla added a commit that referenced this issue Sep 19, 2023
patrickdalla added a commit that referenced this issue Sep 19, 2023
reenqueue the item till a response is available. Add metadata info for
country, state, city and suburb.
patrickdalla added a commit that referenced this issue Sep 19, 2023
@patrickdalla
Copy link
Collaborator Author

patrickdalla commented Sep 19, 2023

"Yes, WhatsAppParser adds the ExtraProperties.DOWNLOADED_DATA property to flag downloaded medias and it also put a
notice about that in the chat HTML report."
Maybe not this flag, as it seems to indicate that the entire item was downloaded. In this case, only the metadata info will contain enriched/downloaded information. We could create some config file with the list of metadata names that are enrichment/downloaded.

patrickdalla added a commit that referenced this issue Sep 19, 2023
patrickdalla added a commit that referenced this issue Sep 20, 2023
patrickdalla added a commit that referenced this issue Sep 21, 2023
nominatim server was unable to GEOCODE (error).
patrickdalla added a commit that referenced this issue Sep 21, 2023
metadata, so it can be better processed in analysis gui.
patrickdalla added a commit that referenced this issue Sep 21, 2023
nominatim responses that do not add certain nominatim fields.
patrickdalla added a commit that referenced this issue Sep 21, 2023
exception, and also to inform error only for georeferenced file.
patrickdalla added a commit that referenced this issue Sep 21, 2023
NominatimTask.NOMINATIM_METADATA metadata in the case.
@lfcnassif lfcnassif linked a pull request Apr 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants