Skip to content
Snippets Groups Projects
README.md 1.99 KiB
Newer Older
junhao's avatar
junhao committed
## Usage:
junhao's avatar
junhao committed

junhao's avatar
junhao committed
* 	download file typing this in terminal
junhao's avatar
junhao committed

	```python
	git clone https://gitlab.eecs.umich.edu/junhao/ioe437scraper
	```

junhao's avatar
junhao committed
*	before running, make sure have FireFox, python2.7, jupyter notebook and pip on local computer, also install selenium, pandas, bs4 by typing this in terminal:
junhao's avatar
junhao committed
	
	```python
	pip install selenium
	pip install pandas
	pip install bs4
	```
junhao's avatar
junhao committed

junhao's avatar
junhao committed
* 	generate two Pandas DataFrame pickle file for further analysis of journals by typing this in terminal.
junhao's avatar
junhao committed
* 	IMPORTANT: enter unique name and password when the program prompts so.
junhao's avatar
junhao committed
	
junhao's avatar
junhao committed
	```python
junhao's avatar
junhao committed
	python parsejournal.py 
junhao's avatar
junhao committed
	```
junhao's avatar
junhao committed

junhao's avatar
junhao committed
* generate view for pandas DataFrame by typing this in python interpreter / jupyter notebook 
junhao's avatar
junhao committed
   
junhao's avatar
junhao committed
   ```python
junhao's avatar
junhao committed
   import pandas

junhao's avatar
junhao committed
   pandas.read_pickle('AAPdata') # get view for Accident Analysis and Prevention data
   
   pandas.read_pickle('TRPdata') # get view for Transportation Research Part F data
junhao's avatar
junhao committed
   ```
junhao's avatar
junhao committed

junhao's avatar
junhao committed
* generate csv files from pandas DataFrame by typing this in python interpreter / jupyter notebook
junhao's avatar
junhao committed
	
junhao's avatar
junhao committed
	```python
junhao's avatar
junhao committed
	import pandas
	pandas.read_pickle('AAPdata').to_csv('AAP.csv',encoding='utf-8') # create csv file for Accident Analysis and Prevention data

	pandas.read_pickle('TRPdata').to_csv('AAP.csv',encoding='utf-8') # create csv file for Transportation Research Part F data
junhao's avatar
junhao committed
	```
junhao's avatar
junhao committed

junhao's avatar
junhao committed
## Content:
junhao's avatar
junhao committed

junhao's avatar
junhao committed
* AAP.csv and TRP.csv already partially tagged with topics, with related pdf file links for further reading
junhao's avatar
junhao committed

junhao's avatar
junhao committed
## Future Development:
junhao's avatar
junhao committed

junhao's avatar
junhao committed
* Develop API to turn class literature review / research into one-click using Data Mining and Machine Learning

junhao's avatar
junhao committed
* Apply word2vec model on paper title for classification / clustering of topics
junhao's avatar
junhao committed

junhao's avatar
junhao committed
* Run data mining on author names to link research topics with author university / nationality

* Create visualization for how topics change over years / nationalities / universities

* Create predictive models on future topics from possible inputs

* Predict "the next big thing" research topic and related research-rich university based on different countries
junhao's avatar
junhao committed

junhao's avatar
junhao committed

## Interested?

* Please contact junhao@umich.edu