Skip to content
This repository has been archived by the owner on Jan 8, 2021. It is now read-only.

Cleanup of old user accounts - Issue#218 #230

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ghost
Copy link

@ghost ghost commented Jun 9, 2012

Added a Celery task that checks for inactive lazy users once a day and removes them form DB.


from piplmesh.account import models as account_models
from piplmesh.api import base, models as api_models

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base is not used?

@ghost
Copy link
Author

ghost commented Jun 9, 2012

Fixed.

from piplmesh.api import models as api_models

@task.task
def clean_inactive_lazy_users():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. But this we will have to REALLY improve. Now you read ALL users from database into Python just to know which users not to process. ;-)

This is why databases support queries. So that you can limit what is transferred between database and Python to only what you are interested in at the end.

So please create a MongoEngine query which will return only users which have not content and which are was more than the timeout inactive. Then run over them is_anonymous and delete them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying for few days now to find a way to build such query, but I had no luck. I literally checked all pages on google about this topic but I'm still stuck at the query. Any help would be much appreciated. I assume I have to use something like this:
api_models.Post.objects(comments__author=user) - returns me the posts, which have comments written by an user. But this is just one part of the query and I don't know how to combine posts, comments and users together in a query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this you can get all posts by the user at the same time as all posts with comments by the user: http://mongoengine-odm.readthedocs.org/en/latest/guide/querying.html#advanced-queries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But it is not necessary that it helps you here much.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But probably you will have to do a server-side query. Have you tried asking the question on StackOverflow?

@mitar
Copy link
Member

mitar commented Jun 14, 2012

OK. So I suggest you to do this in two steps. First define periodic Celery task which computes statistics for each user, that is, for now, number of posts and comments. You can define in User document new field, statistics_count_all. Then you use MongoDB map-reduce.

For the map-reduce, you use incremental map-reduce over posts collection with user as output collection, where in map for each post you compute for each user counts, then in reduce you add this count to temporary field in user (_temp_statistics_count_all). In finalize, you rename this temporary field to statistics_count_all. This last step is necessary so that during the map-reduce current statistics is maintained. And only at the end the values is replaced.

Then, you define another Celery task, which selects all inactive users with statistics_count_all zero and then you in code check if they are not authenticated. If not, you delete.

This could in theory work. In practice - you will see. ;-)

@mitar
Copy link
Member

mitar commented Aug 1, 2012

Ping.

@mitar
Copy link
Member

mitar commented Aug 14, 2012

Please update from main repository.

@ghost
Copy link
Author

ghost commented Aug 14, 2012

Ok.

@mitar
Copy link
Member

mitar commented Aug 14, 2012

And of course, finish this. :-)

@ghost
Copy link
Author

ghost commented Aug 18, 2012

How do I even call the map_reduce()?

I tried this:
api_models.Post.map_reduce()

Result:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: type object 'Post' has no attribute 'map_reduce'

I read that I have to import MongoDBManager, but I don't know where do I have to import it from? It is done like this in docs: from django_mongodb_engine.contrib import MongoDBManager. But this doesn't work in our application.

Also, I asked for help on this Issue at Stack Overflow, but there was no accurate answer. I only managed to transform this python code ...

for post in api_models.Post.objects:
    users_with_content.append(post.author)

... into a query. But still no nice way to iterate over posts' comments.

@mitar
Copy link
Member

mitar commented Aug 21, 2012

django_mongodb_engine is not the same as mongoengine. Don't mix libraries!

@mitar
Copy link
Member

mitar commented Aug 21, 2012

You could provide a link to your question on Stack Overflow. Maybe I would understand more what is question here.

@mitar
Copy link
Member

mitar commented Aug 21, 2012

So, we are using MongoDB as a database. For it, there are multiple libraries. One is low-level library for direct access, and then on top of that there are some abstractions. One of those high-level libraries is mongoengine and we are using it.

@mitar
Copy link
Member

mitar commented Aug 21, 2012

Here it is written about map-reduce interface exposed by MongoEngine. So it seems api_models.Post.objects.[some other queryset specs, like filter].map_reduce(...).

@ghost
Copy link
Author

ghost commented Aug 21, 2012

Here is my question on SO.

@mitar
Copy link
Member

mitar commented Aug 21, 2012

You do understand why we are wanting map reduce? Because we do not want to iterate over all posts every time. Iterate over all posts and transform them into documents. One approach is to use scalar, which returns just ids, but this is still too much. Then you are still creating a long query with exclude. I don't think this can really scale.

@mitar
Copy link
Member

mitar commented Aug 21, 2012

So proposed approach with map-reduce is not easy. :-) I must admit that I didn't get to it myself, but @kostko did. :-) But now you just have to implement it. This should be easy. :-)

@ghost
Copy link
Author

ghost commented Aug 21, 2012

Yeah, I do understand why we want map/ reduce. But at the time I asked the question on SO I was wondering if there's any other way to do this nicely.
I hope so. :-)

@mitar
Copy link
Member

mitar commented Aug 25, 2012

How is this going?

@mitar
Copy link
Member

mitar commented Sep 5, 2012

TODO.

@mitar
Copy link
Member

mitar commented Oct 7, 2012

Ping? Do you want to work more on this?

@mitar
Copy link
Member

mitar commented Nov 6, 2012

I moved account code to separate Python package, so you will have to continue there:

https://github.com/mitar/django-mongo-auth

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants