Project Domain / Category
Abstract / Introduction
A rapid growth of research articles is creating a problem of information overload for
the researchers. Due to which both novice and expert researchers find it very
difficult to download research articles of some specific journal/conference or form
a web page. Therefore, there is need of application which will be able to download
all research articles form a specific journal or conference or from a web page. To
overcome this problem, we will develop a research articles crawler application which
will be able to download freely available scientific articles of user interests in the
form of PDF or Doc/Docx format and process those documents.
Create a Signup module. User will be required to register their self in the
Create a Sign-in module. Only registered user will be able to use the
3. Articles Scraping and Downloading with Creation of Web Pages:
Make a webpage which will take URL of some conference/journal or a
webpage and download all related scientific articles.
4. Download Status:
Show all download articles titles in the form of list over the webpage at run
time below the input URL text box and download button.
5. Maintain Articles History:
Show downloaded articles history on a separate webpage.
6. Browse Downloaded Scientific Articles for Processing:
Create another webpage through which you can browse and select one or
more pdfs from the downloaded pdfs.
7. Process and Store Data:
Extract different sections of downloaded pdfs e.g. (Title, Authors, Keywords,
Abstract, References). Save it in Excel file or CSV file column wise e.g. first
column name is “Title”, Second is “Authors” and up to soon.
8. Convert CSV/Excel to JSON File Format:
Create another page which will convert this CSV or Excel file to JSON file.
Programming Language: Python
Framework: Django or Flask
IDE: PyCharm, Visual Studio or any other
Database: MySQL, MongoDB or any other