大家好，又见面了，我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。

Jetbrains全系列IDE稳定放心使用

HTML转word

背景介绍

背景介绍

业务：将平台中笔记（富文本）以word形式导出。

1. 使用POI进行转化

依赖jar
poi-3.17.jar
poi-excelant-3.17.jar
poi-ooxml-3.17.jar
poi-ooxml-schemas-3.17.jar
jsoup-1.11.3.jar

1.1 思路

a). 获取HTML
b). HTML标准化
c). 转化

1.2 代码示例

/**
  * HTML转word
  *
  * @param noteName 导出文件名称
  * @paramre portDirName 文件路径
  * @param researchNoteInfo 文件的html
  * @return void
  * @author Solitary
  * @date 2019/1/11 9:21
  */
public static void htmlToWord(String noteName, String reportDirName, String researchNoteInfo) throws Exception {
     //拼一个标准的HTML格式文档
    Document document = Jsoup.parse(researchNoteInfo);
    FileUtils.fileIsExist(reportDirName);
    InputStream is = new ByteArrayInputStream(document.html().getBytes("GBK"));
    OutputStream os = new FileOutputStream(reportDirName+noteName);
    inputStreamToWord(is, os);
}
    
/**
 * 把is写入到对应的word输出流os中
 *
 * @param is
 * @param os
 * @throws IOException
 */
private static void inputStreamToWord(InputStream is, OutputStream os) throws IOException {
    POIFSFileSystem fs = new POIFSFileSystem();
    DirectoryNode root = fs.getRoot();
    root.createDocument("WordDocument", is);
    fs.writeFilesystem(os);
    os.close();
    is.close();
}

1.3 思考

使用poi进行HTML转word的确很简单的，但是一个很棘手的问题就是当图片资源失效、断网的情况下，word的图片无法显示。所以使用这种方式转word显得有点鸡肋了。

2. 使用jacob进行转化

依赖jar
jacob.jar
jsoup-1.11.3.jar

2.1 思路

a). 标准化HTML
b). 下载图片资源到本地
c). 替换所有img标签为<p>${img_imgName}</p>
d). 将替换后的HTML写入空白文档doc中
e). 复制上一步写入文档的内容，替换所有${img_imgName}为本地图片路径
f). 另存为doc文件

2.2 代码示例

/**
 * 创建空白文档_写入html_处理空白文档image_复制空白文档至最终文档
 * 
 * @param imgs
 * @param html
 * @param localpath
 */
 public static String getWord(String html, String localpath, long researchId) {
     // 下载图片到本地 略
     // 图片在文档中的键${name} - 值图片的绝对路径    imgMap.put("${ABC}", localpath + "\\ABC.png");
     Map<String, String> imgMap = new HashMap<String, String>();
     // 解析html_创建空白文档_html写入空白文档
     Document document = Jsoup.parse(html);
     for (Element element : document.body().select("body > *")) {
     	sysElementText(element, localpath, imgMap);
	 }
     createWord(localpath, "blank");
     File doc = new File(localpath + File.separator + "blank.doc");
     FileWriter fw;
     try {
         fw = new FileWriter(doc);
         fw.write(document.html(), 0, document.html().length());// 写入文件
         fw.flush(); // 清空FileWriter缓冲区
         fw.close();
     } catch (IOException e) {
         e.printStackTrace();
     }
     String complete = String.valueOf(researchId);
     // 复制空白文档-粘贴到临时文档（相当于手动执行copy_paste）
     MSOfficeGeneratorUtils officeUtils = new MSOfficeGeneratorUtils(false);
     officeUtils.openDocument(localpath + File.separator + "blank.doc");
     officeUtils.copy(); // 拷贝整篇文档
     officeUtils.close();
     officeUtils.createNewDocument();
     officeUtils.paste(); // 粘贴整篇文档
     // 将图片${image_name}替换为真实图片
     for (Entry<String, String> entry : imgMap.entrySet())
         officeUtils.replaceText2Image(entry.getKey(), entry.getValue());
     
     officeUtils.setFont(true, false, false, "0,0,0", "20", "宋体"); // 设置字体,具体参数
     officeUtils.saveAs(localpath + File.separator + complete + ".doc"); // 可生成UUID.doc文件，利用UUID防止同名
     officeUtils.close(); // 关闭Office Word创建的文档
     officeUtils.quit(); // 退出Office Word程序
     imgMap.clear();
     return complete;
 }
 
/**
 *  替换img标签为p标签
 * 
 * @param node 
 * @param imgPath 本地图片存储路径
 * @param imgMap key：${imgName} value:
 */
public static void sysElementText(Node node, String imgPath, Map<String, String> imgMap) {
  	if (node.childNodes().size() == 0) {
  		if (node.nodeName().equals("img")) {
  			String src = node.attr("src");
			String fileName = src.substring(src.lastIndexOf("/") + 1, src.length());
			Element element = new Element("p");
			element.append("${"+fileName+"}");
			element.attr("style", node.attr("style"));
			node.replaceWith(element);
			imgMap.put("${"+fileName+"}", imgPath + File.separator + fileName);
  		}
  	}
  	if (node.childNodes().size() > 0) {
  		List<Node> childNodes = node.childNodes();
  		for (Node node2 : childNodes) {
  			if (node2.nodeName().equals("img")) {
  				String src = node2.attr("src");
  				String fileName = src.substring(src.lastIndexOf("/") + 1, src.length());
  				Element element = new Element("p");
  				element.append("${"+fileName+"}");
  				element.attr("style", node2.attr("style"));
  				node2.replaceWith(element);
  				imgMap.put("${"+fileName+"}", imgPath + File.separator + fileName);
  			}
		}
  	}
}

    /**
     * 创建word文档
     * 
     * @param localpath
     * @param name
     * @return
     */
    public static void createWord(String localpath, String name) {
        MSOfficeGeneratorUtils msOfficeUtils = new MSOfficeGeneratorUtils(false); // 整合过程设置为可见
        msOfficeUtils.createNewDocument();
        msOfficeUtils.saveAs(localpath + File.separator + name + ".doc");
        msOfficeUtils.close();
        msOfficeUtils.quit();
    }

MSOfficeGeneratorUtils该类参考：http://www.cnblogs.com/liudaihuablogs/p/9761297.html

2.3 思考

该方式转换图片正常显示，唯一不足的地方在于jacob使用的是office的api，服务器必须是windows，在linux下是不能运行的，所以很奔溃。
于是，我们申请一台windows服务器，在该调用HTML转word前，发送消息给windows服务器执行生成word。之后，通过Smb服务，java中的SmbFile获取远程文件到本地。

3. 使用itext进行转化

依赖jar
itext-2.1.7.jar
itext-rtf-2.1.7.jar

3.1 思路

a). 将img标签中的src修改为本地图片路径
b). 以rtf方式导出为word

3.2 代码示例

private static void html2WordIText(String html, String noteName, String reportDirName, long researchId) {
	// 图片临时存放路径
	String pwd = "tmp/researchNote";
	Set<String> srcSet = ImageUtils.getImgStr(html);
	for (String src : srcSet) {
		String srcName = src.split("[/]")[src.split("[/]").length - 1];
		ImageUtils.download(src, srcName, pwd + "/" + String.valueOf(researchId));
		String newSrc = pwd + "/" + String.valueOf(researchId) + "/" + srcName;
		html = html.replace(src, newSrc);
	}
	FileUtils.fileIsExist(reportDirName);
	OutputStream out = null;
	try {
		out = new FileOutputStream(reportDirName + noteName);
		Document document = new Document(PageSize.A4);
		RtfWriter2.getInstance(document, out);
		document.open();
		Paragraph context = new Paragraph();
		// Image img = Image.getInstance("D:\\图片\\2.jpg");
		// img.setAbsolutePosition(0, 0);//
		// document.add(img);
		StyleSheet ss = new StyleSheet();
		HashMap<String, String> interfaceProps = Maps.newHashMap();
		interfaceProps.put("img_baseurl", "");
		List htmlList = HTMLWorker.parseToList(new StringReader(html), ss, interfaceProps);
		for (int i = 0; i < htmlList.size(); i++) {
			com.lowagie.text.Element e = (com.lowagie.text.Element) htmlList.get(i);
			context.add(e);
		}
		document.add(context);
		document.close();
		FileUtils.deletefile(pwd);
	} catch (Exception e) {
		
	} finally {
		try {
			if (out != null) {
				out.close();
			}
		} catch (IOException e) {
			
		}
	}
}

ImageUtils.java :
public static Set<String> getImgStr(String htmlStr) {
       Set<String> pics = new HashSet<String>();
       String img = "";
       Pattern pImage;
       Matcher mImage;
       String regEx_img = "<img.*src\\s*=\\s*(.*?)[^>]*?>";
       pImage = Pattern.compile(regEx_img, Pattern.CASE_INSENSITIVE);
       mImage = pImage.matcher(htmlStr);
       while (mImage.find()) {
           // 得到<img />数据
           img = mImage.group();
           // 匹配<img>中的src数据 
           Matcher m = Pattern.compile("src\\s*=\\s*\"?(.*?)(\"|>|\\s+)").matcher(img);
           while (m.find()) {
               pics.add(m.group(1));
           }
       }
       return pics;
   }

 /**
	 * 下载图片
	 * @param urlString  路径
	 * @param filename   保存的文件名
	 * @param savePath   保存路径
	 */
	public static void download(String urlString, String filename, String savePath) {
	    InputStream is = null;
	    OutputStream os = null;
	    try {
	    	File researchFile = new File(savePath + File.separator + filename);
	    	if (researchFile.exists()) {
	    		return;
	    	}
	    	 // 构造URL
	        URL url = new URL(urlString);
	        // 打开连接
	        URLConnection con = url.openConnection(new Proxy();
	        // 设置请求超时为5s
	        con.setConnectTimeout(5*1000);
	        con.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)"); 
	        // 输入流
	        is = con.getInputStream();
	
	        // 1K的数据缓冲
	        byte[] bs = new byte[1024 * 1024 * 3];
	        // 读取到的数据长度
	        int len;
	        // 输出的文件流
	        File sf=new File(savePath);
	        if(!sf.exists()){
	            sf.mkdirs();
	        }
	        os = new FileOutputStream(sf.getPath() + File.separator + filename);
	        // 开始读取
	        while ((len = is.read(bs)) != -1) {
	            os.write(bs, 0, len);
	        }
	        // 完毕，关闭所有链接
	        os.close();
	        is.close();
	    } catch (IOException e) {
	        e.printStackTrace();
	    } finally {
	    	if (os != null) {
	    		try {
					os.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
	    	}
	    }
	}

4. 总结

方法一断网无法显示图片，方法二linux下无法生成，方法三图片大小不易调整；但总体来说方法三优于前两者。如有错误还请指正，谢谢。

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/182704.html原文链接：https://javaforall.net

HTML转word_讯飞语记怎么变成word文档

HTML转word

背景介绍

1. 使用POI进行转化

1.1 思路

1.2 代码示例

1.3 思考

2. 使用jacob进行转化

2.1 思路

2.2 代码示例

2.3 思考

3. 使用itext进行转化

3.1 思路

3.2 代码示例

4. 总结

发表回复

HTML转word_讯飞语记怎么变成word文档

HTML转word

背景介绍

1. 使用POI进行转化

1.1 思路

1.2 代码示例

1.3 思考

2. 使用jacob进行转化

2.1 思路

2.2 代码示例

2.3 思考

3. 使用itext进行转化

3.1 思路

3.2 代码示例

4. 总结

相关推荐

Taiko taiko

Pytest（6）重复运行用例pytest-repeat「建议收藏」

layoutSubviews 详解

AJAX常见面试问题[通俗易懂]

【Redis缓存机制】1.Redis介绍和使用场景

python监控网站更新_Python 通过网站search功能监控网站内容更新[通俗易懂]

发表回复