parallelizing a loop over file reads
已回答
I am using PyCharm 2016.3.2 with Python 3.6 as the interpreter to convert PDF files to .TXT The code I have (see below) works fine, but it converts files sequentially and slowly.
from tika import parser
from os import listdir
for filename in listdir("C:\\Dropbox\\Data"):
text = parser.from_file('C:\\Dropbox\\Data'+filename)
with open('C:\\Dropbox\\Data\\textoutput\\'+filename+'.txt', 'w+') as outfile :
outfile.write(text["content"])
I am very new to Python coding so any help in parallelizing this block of code will help, since I'm dealing with >100,000 files (65 GB+)
请先登录再写评论。
http://chriskiehl.com/article/parallelism-in-one-line/
Here is a simple way to use multithreading to improve speed.
Since you are trying to learn, why aren't you asking this question on stackoverflow?