93

I am a final year graduate student and I have my thesis (about 350 pages) in Microsoft Word format. I would like to convert the document into a LaTeX "camera" ready PDF. Is there any easy way to do this? I am very new to LateX..

4
  • 3
    This topic can help you: How can I import an exam or assignment from Word into LaTeX?
    – Spike
    Commented Sep 7, 2011 at 6:41
  • 1
    If you made proper use of styles in Word (which I assume you did for 350 pages – no one should be insane enough to do that without styles) it should be a fairly straightforward conversion you could do with VBA from within Word.
    – Joey
    Commented Sep 7, 2011 at 11:14
  • 2
    Just to comment about how your thesis is so long :D (that's almost two times the page num of most PhD thesis I've read).
    – Jim Raynor
    Commented Apr 12, 2014 at 11:20
  • This was a useful tool for me for going from an Excel XLSX table to LaTeX: ericwood.org/excel2latex
    – wwwilliam
    Commented Oct 10, 2015 at 4:04

15 Answers 15

47

New version of writer2latex is pretty good. It works with the Open Office, but I think their command line utility should work without the OO. You can set quality of the converted document - from LaTeX as clean as possible, to version which tries to emulate appearance of source word document.

Structure and basic formatting should be converted well, but I am not sure about math, as there are big differences in math between Word and OO.

3
  • 3
    Command line utility just throws all kinds of exceptions. Commented Mar 14, 2014 at 13:16
  • 5
    I wanted to mention here that while writer2latex or pandoc (see below) may do the job, the LaTeX output is not really suitable for further editing. You should only use them on final versions of documents.
    – Sameer
    Commented Sep 20, 2014 at 17:35
  • I think that using the converter as w2l -latex -ultraclean file.odt creates a LaTeX that, although it's not perfect, it is a good starting point for further editing --- at least on simple documents. And most (simple) formulas come out correctly.
    – Rmano
    Commented Apr 4, 2017 at 18:46
23

The free open source word processor AbiWord has an MS Word import function, and, if you install it (be sure to check it under install time, or if on Linux, install the necessary plugin package), a LaTeX export function. It works decently well for simple documents.

I personally prefer it to the other options, including writer2latex.

Another tool I've tried and had some success with is rtf2latex2e for converting rtf to latex. (You can export to RTF from Word of course.)

As has already been made clear, you can't expect perfection from any of these methods, and it'll require a lot of hand-fixing.

2
  • 5
    With even a properly formatted Word document it created only basically plain-text latex file. No chapters/sections, no enumerates/itemizes. Commented Mar 14, 2014 at 13:09
  • 1
    WOW! I got rtf2latex2e to work for a file that pandoc would not convert. THANKS!
    – vy32
    Commented Mar 6, 2017 at 0:28
22

I am somewhat late to the party, as the question's author has, hopefully, graduated. But, for the sake of completeness of answers, I'd like to mention a universal (and now very popular) format converter pandoc (http://johnmacfarlane.net/pandoc), which is open source and supports an extremely wide variety of document formats, including presentation slides and e-books.

5
  • 1
    Thanks very much for the tip; pandoc worked very well, though with a bit of tweaking:   • Wrap your title in \title{}.   • Likewise \author{}, \date{}, \begin{abstract}.   • Manually convert Unicode, e.g., ∴ => \therefore. (I normally use xunicode and friends, but didn’t try it for this.)   • Nuke the parskip.sty block from orbit.   • I needed one \nopagebreak. Commented Jun 7, 2016 at 2:25
  • Oh, and   • \maketitle Commented Jun 7, 2016 at 3:08
  • @FlashSheridan: You're very welcome. Glad my advice was helpful. Thank you for the tweaking tips - though, for better or worse, I don't plan on doing such conversions any time soon. :-) Commented Jun 7, 2016 at 5:48
  • 1
    My steps are getting too long for comments; see tex.stackexchange.com/a/317261/3505. Commented Jul 1, 2016 at 1:51
  • @FlashSheridan: Nice addition - upvoted the answer. Commented Jul 1, 2016 at 2:32
17

You can't convert MS Word document to LaTeX directly. The two formats are rather incompatible. Last time I had to do it (a 4-page paper written by my Prof) I saved it as text-only and readded all formatting, math, images and tables manually. As you can guess it was quite an effort which is not doable for a 350 pages document, except in the unlikely case that it would really be all text with minimal formatting (some arts thesis maybe?).

Have also a look on What is the best way to make the transition from Microsoft Word to LaTeX? or on Convert TeX to non-TeX and back, but I don't think you will get away easy with this task in any case.

3
  • Thanks for the info. I will read through the linked questions. JamesW's solution looks much easier! :)
    – JohnJ
    Commented Sep 7, 2011 at 7:03
  • 1
    I wouldn't say incompatible. If you use Word properly the differences in the actual document structure are quite small.
    – Joey
    Commented Sep 7, 2011 at 11:15
  • 2
    @Joey: You can't convert a Word document to a LaTeX document or vice versa in an easy way which good results. You wouldn't get a good LaTeX document im my opinion. That is what I meant with incompatible. Sure, it is possible somehow to get some LaTeX document ... Commented Sep 7, 2011 at 11:19
11

This isn't technically an answer to the question you asked, but it looks from your question that you may have a misunderstanding.

Latex is a type setting language, and through programs such as pdflatex, you can turn this into a pdf file. It is certainly not the only way to create a pdf file. If creating a pdf from your word file is your ultimate goal, then there are much more sensible ways to do this.

You can pay for adobe professional. They may even have a free pdf conversion tool, not sure about that. There is a free tool called PDFcreater:

http://sourceforge.net/projects/pdfcreator/

When installed, this will become a print driver on your computer. Basically you go into Word, and tell it to print your document and then select PDFcreator as your printer. It will go through various options and ultimately create your pdf for you.

You may want to check though that everything has come out properly. I have known objects to move about and gradients to come out wrongly during a conversion, but it may be all fine.

Hope this helps

1
  • 1
    No need for PDFcreator, just do File - Export - Create PDF/XPS in Word. But the question is now how to get the TeX source of the PDF for further editing.
    – Michael S.
    Commented Oct 23, 2015 at 11:48
10

Not exactly a 'Word2Latex' answer, but this will still achieve the end result requested:

You can bridge the gap between Word and LaTeX by using LibreOffice's LaTeX export plugin. Open a Word document in LibreOffice, optionally save as ODT, and export to LaTeX.

There will still be manual editing to do, but at least the major parts will be done for you - doc envelope, sectioning and other trivial stuff. So that you won't have to hunt a plain text file for the chapter/section titles.

LibreOffice will export directly to PDF too and read PDF into ODT which is useful when one wants to reverse-engineer a PDF into LaTeX.

10

If you're running an AppleScript-compatible operating system, I've written a script to do this. It has many limitations as far as pictures go (totally unsupported), but it handles the essentials (bold, italics, underscores, percent signs, dollar signs, tables (in tabu)). Note that it keeps everything in unicode, therefore the fontspec package is recommended with xelatex. It is a work in progress.

You can find the latest version at: https://gist.github.com/macmadness86/5582426

Note that if you have TeXShop installed, you can optionally uncomment two lines:

- `--my openInTeXShop()`
- `--my closeWordDocByName(myDup, false)`

This will automatically copy the "texified" Word Document into a new TeXShop Document and close the duplicated Word Document.

For the sake of keeping stackexchange self-contained, I will post the latest version as of this post here:

(*
Notes:
Ver. 2.11
Created by macmadness86 on 29.12.2013

Author of TeX Tutorials on YouTube
http://www.youtube.com/user/XeTeXTutorials?feature=watch

StackExchange User
http://stackoverflow.com/users/1236128/macmadness86

Instructions for use:
Have Microsoft Word document open. The frontmost document will be processed. The script creates a replica before processing, in order to avoid losing data. This document remains open when the script is finished and its contents can be copied to a tex editor e.g. TeXShop and compiled.

Version Notes:
26.12.2013 version 2.0 improved the table support. Now tables are coded as centered tabu tables.
29.12.2013 version 2.1 added list support using standard bullet or simple numbering buttons on the Word GUI. Supports only 1 embedded list.

Issues:
This script depends on a paragraph before a table. Therefore, a table must not be located at paragraph 1. There is a glitch in MS Word, preventing a script from adding a paragraph before a table (as far as I know).
*)

set myDup to my duplicateDoc()
--set outputPathAL to (path to desktop folder as string) & "Temporary Saved Doc for Latex Conversion.doc"
--my saveWordDoc(outputPathAL)


tell application "Microsoft Word"
if (count of documents) is greater than or equal to 1 then
tell document 1
-- Edit sectionTags and inlineTags for
set sectionTags to {{"Title", "title"}, {"Heading 1", "section"}, {"Heading 2", "subsection"}, {"Heading 3", "subsubsection"}}
set inlineTags to {{"Paragraph", "paragraph"}}
global stylesList
set stylesList to (get name local of Word styles)
-- Automated List
set sectionStyles to {}
repeat with itemStep from 1 to count of sectionTags
set end of sectionStyles to item 1 of item itemStep of sectionTags
end repeat
set {xpath, xname, xext, xbodytext, paraCount, wordCount} to {(get default file path file path type documents path), name, (get name extension), (get contents of text object), get count of paragraphs, get count of words}
--set allStyles to Word styles
-- Takes care of Section Tags
repeat with paraStep from 1 to paraCount
set paraStyle to (get style of paragraph paraStep)
set paraContent to (get content of text object of paragraph paraStep)
repeat with itemStep from 1 to (count of sectionTags)
if (paraContent as string) does not contain "{" then
set {wordTag, texTag} to {item 1 of item itemStep of sectionTags, item 2 of item itemStep of sectionTags}
try --
--return paraStyle
if Word style paraStyle is Word style wordTag then
my texifyHeading(paraContent, wordTag, texTag, paraStep)
end if
end try
end if
end repeat
end repeat
-- Handle Bold and Italics
end tell
end if
end tell


my texifyBoldItalicsQuotes()
my findReplace("&", "\\&")
my findReplace("$", "\\$")
my findReplace("_", "\\_")
my findReplace("%", "\\%")
my texifyLists()
my texifyTables()
set preBody to "
\\documentclass[10pt]{article}
\\usepackage{fontspec}
\\usepackage{tabu}
\\begin{document}
"
set postBody to "
\\end{document}"

addTextToFrontOfDoc(preBody)
addTextToEndOfDoc(postBody)
--my openInTeXShop()
--my closeWordDocByName(myDup, false)

on texifyHeading(para_Content, word_style, tex_style, para_num)
tell application "Microsoft Word"
tell active document
--set content of text object of paragraph paraNUM to "poop"
--select text object of paragraph paraNUM
--set orig_text to content of text object of paragraph paraNUM
set sedFix to do shell script "echo " & para_Content & "| sed \"s/$(printf '
')\\$//\""
try
if last character of (para_Content as string) is (ASCII character 13) then
set returnText to "\\" & tex_style & "{" & sedFix & "}" & "
"
else
set returnText to "\\" & tex_style & "{" & para_Content & "}"
end if
set content of text object of paragraph para_num to returnText
set style of paragraph para_num to word_style
--end if
end try
end tell
end tell
end texifyHeading

on texifyWord(word_content, tex_style, word_num)
tell application "Microsoft Word"
try
set bold_Style to make new Word style at active document with properties ¬
{name local:"Bold Tagged", style type:style type character}
set bold of font object of bold_Style to true
end try
tell active document
set wordRange to word word_num
--ASCII character 32 is space
if last character of (word_content as string) is (ASCII character 32) then
set wordOnly to set range wordRange start ((start of content of wordRange)) ¬
end ((end of content of wordRange) - 1)
set word_content to content of wordOnly
end if
set newContent to "\\" & tex_style & "{" & word_content & "}"
set style of word word_num to "Bold Tagged"
set content of word word_num to newContent --("\\" & tex_style & "{" & word_content & "}")
--set style of word word_num to
end tell
end tell
end texifyWord


on texifyBoldItalicsQuotes()
tell application "Microsoft Word"
if (count of documents) is greater than or equal to 1 then
set stylesList to (get name local of Word styles of active document)
if stylesList does not contain "Bold Tagged" then
set bold_Style to make new Word style at active document with properties ¬
{name local:"Bold Tagged", style type:style type character}
set bold of font object of bold_Style to true
end if
if stylesList does not contain "Italic Tagged" then
set italic_Style to make new Word style at active document with properties ¬
{name local:"Italic Tagged", style type:style type character}
set italic of font object of italic_Style to true
end if
--save as active document file name "Temp.doc"
set curly to false -- change to true if curly quotes desired
set wasSmartQuotes to auto format as you type replace quotes of settings
set auto format as you type replace quotes of settings to curly
set myFind to find object of selection
tell myFind
clear formatting myFind
execute find find text "&" replace with "\\&" replace replace all
execute find find text "_" replace with "\\_" replace replace all
end tell
-- mark up italics and take out of italics
clear formatting of myFind
set forward of myFind to true
set wrap of myFind to find continue
set style of myFind to "Normal"
set italic of font object of myFind to true
set content of myFind to ""
clear formatting replacement of myFind
set content of replacement of myFind to "\\emph{^&}"
set italic of font object of replacement of myFind to false
set style of replacement of myFind to "Italic Tagged"
execute find myFind replace replace all
-- mark up bold and take out of bold
clear formatting of myFind
set forward of myFind to true
set wrap of myFind to find continue
set bold of font object of myFind to true
set style of myFind to "Normal"
set content of myFind to ""
clear formatting replacement of myFind
set content of replacement of myFind to "\\textbf{^&}"
set bold of font object of replacement of myFind to false
set style of replacement of myFind to "Bold Tagged"
execute find myFind replace replace all
clear formatting of myFind
set forward of myFind to true
set wrap of myFind to find continue
set style of myFind to "quotation"
set content of myFind to ""
clear formatting replacement of myFind
set content of replacement of myFind to "\\begin{quotation}^&\\end{quotation}"
--set style of replacement of myFind to "normal"
execute find myFind replace replace all
set auto format as you type replace quotes of settings to wasSmartQuotes
end if
end tell
end texifyBoldItalicsQuotes

on findReplace(textToFind, replacementText)
tell application "Microsoft Word"
if (count of documents) is greater than or equal to 1 then
-- mark up italics and take out of italics
set myFind to find object of selection
clear formatting of myFind
set forward of myFind to true
set wrap of myFind to find continue
--set style of myFind to "Normal"
--set italic of font object of myFind to true
set content of myFind to textToFind
clear formatting replacement of myFind
set content of replacement of myFind to replacementText --\\emph{^&}
set italic of font object of replacement of myFind to false
execute find myFind replace replace all
end if
end tell
end findReplace

on texifyLists()
tell application "Microsoft Word"
--tell active document
--end tell
--set listFormatProps to properties of list format of text object of selection
--set paraStep to GetParagraph() of me
set paraCompensator to 0
repeat with paraStep from 1 to count of paragraphs of active document
set paraStep to paraStep + paraCompensator
set listFormatProps to properties of list format of text object of paragraph paraStep of active document
set styleName to name local of style of text object of paragraph paraStep of active document
if styleName is "List paragraph" then
get list type of listFormatProps
-- ## Setup Itemize Environment
if list type of listFormatProps is list bullet then
-- ## Prefix List Items with \item
if content of text object of paragraph paraStep of active document does not contain "\\item" then
insert text "\\item " at first word of paragraph paraStep of active document
end if
if list type of list format of text object of paragraph (paraStep - 1) of active document is list no numbering then
insert text return & "\\begin{itemize}" at the last word of paragraph (paraStep - 1) of active document
set paraCompensator to paraCompensator + 1
set paraStep to paraStep + 1
end if
-- ## Look for start of embedded list (Level 2 Indent)
if list type of list format of text object of paragraph (paraStep + 1) of active document is list bullet then
if list level number of list format of text object of paragraph (paraStep + 1) of active document is 2 then
insert text return & "\\begin{itemize}" at last word of paragraph (paraStep) of active document
set paraCompensator to paraCompensator + 1
set paraStep to paraStep + 1
end if
end if
-- ## Look for end of embedded list (Level 2 Indent)
if list type of list format of text object of paragraph (paraStep + 1) of active document is list bullet then
if list level number of list format of text object of paragraph (paraStep) of active document is 2 then
if list level number of list format of text object of paragraph (paraStep + 1) of active document is 1 then
insert text "\\end{itemize}" & return at first word of paragraph (paraStep + 1) of active document
set paraCompensator to paraCompensator + 1
set paraStep to paraStep + 1
end if
end if
end if
-- ## Detect end of list (Level 1 Indent) and tag with \end{itemize}
try
if list type of list format of text object of paragraph (paraStep + 1) of active document is list no numbering then
insert text "\\end{itemize}" & return at first word of paragraph (paraStep + 1) of active document
set paraCompensator to paraCompensator + 1
end if
on error
set lastItem to true
insert text return & "\\end{itemize}" at last word of paragraph (paraStep) of active document
log "Reached end of document checking for list at paragraph nr. " & paraStep
end try
end if
end if
-- ## Setup Enumerate Environment
if list type of listFormatProps is list simple numbering then
-- ## Prefix List Items with \item
if content of text object of paragraph paraStep of active document does not contain "\\item" then
insert text "\\item " at first word of paragraph paraStep of active document
end if
if list type of list format of text object of paragraph (paraStep - 1) of active document is list no numbering then
insert text return & "\\begin{enumerate}" at the last word of paragraph (paraStep - 1) of active document
set paraCompensator to paraCompensator + 1
set paraStep to paraStep + 1
end if
-- ## Look for start of embedded list (Level 1 Indent)
if list type of list format of text object of paragraph (paraStep + 1) of active document is list simple numbering then
if list level number of list format of text object of paragraph (paraStep + 1) of active document is 2 then
insert text "\\begin{enumerate}" & return at first word of paragraph (paraStep + 1) of active document
set paraCompensator to paraCompensator + 1
set paraStep to paraStep + 1
end if
end if
-- ## Look for end of embedded list (Level 1 Indent)
if list type of list format of text object of paragraph (paraStep + 1) of active document is list simple numbering then
if list level number of list format of text object of paragraph (paraStep + 1) of active document is 1 then
insert text return & "\\end{enumerate}" at last word of paragraph (paraStep) of active document
set paraCompensator to paraCompensator + 1
set paraStep to paraStep + 1
end if
end if
-- ## Detect end of list and tag with \end{itemize}
try
if list type of list format of text object of paragraph (paraStep + 1) of active document is list no numbering then
set myInsert to insert text "\\end{enumerate}" & return at first word of paragraph (paraStep + 1) of active document
set paraCompensator to paraCompensator + 1
end if
on error
set lastItem to true
insert text return & "\\end{enumerate}" at last word of paragraph (paraStep) of active document
log "Reached end of document checking for list at paragraph nr. " & paraStep
end try
end if
log "Paragraph: " & paraStep
end repeat
repeat with paraStep from 1 to count of paragraphs of active document
set listFormatProps to properties of list format of text object of paragraph paraStep of active document
set styleName to name local of style of text object of paragraph paraStep of active document
set paraContent to content of text object of paragraph paraStep of active document
if styleName is "List Paragraph" then
set myStyle to Word style "Normal" of active document -- replace "Normal" with name of your style in quotations
set content of text object of paragraph paraStep of active document to tab & paraContent
select text object of paragraph paraStep of active document
set style of paragraph format of selection to myStyle
end if
end repeat
end tell
end texifyLists

on texifyTables()
tell application "Microsoft Word"
set tableCount to (count of tables of active document)
if tableCount is greater than or equal to 1 then
repeat with tableNum from 1 to tableCount
set thisTable to table tableNum of active document
set cellCount to (count of cells of text object of thisTable)
set rowCount to (count of rows of text object of thisTable)
--set columnCount to (count of columns of text object of thisTable) --DOES NOT WORK, RESULTS IN 0
set rowCount to number of rows of thisTable
set columnCount to number of columns of thisTable
repeat with rowIncr from 1 to rowCount
repeat with columnIncr from 1 to columnCount
--set myRange to create range active document start (start of content of text object of cell) end ((end of content of text object of cell columnIncr))
end repeat
end repeat
set rowList to {}
repeat with rowIncr from 1 to rowCount
--set columnIncr to 1
repeat with columnIncr from 1 to columnCount --(get cells of row rowIncr of thisTable)
--set myRange to create range active document start (start of content of text object of column columnIncr of thisTable) end ((end of content of text object of column columnIncr of thisTable) - 1)
set cellContent to (get content of text object of (get cell from table thisTable row rowIncr column columnIncr))
-- Remove "end-of-cell marker" AKA remove ASCII character 13 from end of cell content
set cellcontentList to {}
set orig_delims to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 13 -- (a carriage return)
set cellcontentList to text items of cellContent
set cellItems to items of cellcontentList
set AppleScript's text item delimiters to orig_delims
set cellContent to item 1 of cellcontentList
-- Enable Script to be re-run without adding extra "&" or "\\" to tables
if (count of characters of cellContent) is greater than 0 then
if last character of (cellContent) is not "&" then
if columnIncr = columnCount then
if last character of (cellContent) is not "\\" then
set content of text object of (get cell from table thisTable row rowIncr column columnIncr) to cellContent & " \\\\"
end if
else
set content of text object of (get cell from table thisTable row rowIncr column columnIncr) to cellContent & " &"
end if
set columnIncr to columnIncr + 1
end if
else
if columnIncr = columnCount then
--if last character of (cellContent) is not "\\" then
set content of text object of (get cell from table thisTable row rowIncr column columnIncr) to cellContent & " \\\\"
else
set content of text object of (get cell from table thisTable row rowIncr column columnIncr) to cellContent & " &"
end if
set columnIncr to columnIncr + 1
end if
end repeat
end repeat
end repeat -- table loop
-- ## loop for pre and post table code
tell active document
set selPara to GetParagraph() of me
get content of text object of paragraph selPara
end tell
if (count of tables of active document) is greater than 0 then
repeat with tableStep from 1 to count of tables of active document
--tell active document
set thisTable to table tableStep of active document
set cellCount to (count of cells of text object of thisTable)
set rowCount to (count of rows of text object of thisTable)
--set columnCount to (count of columns of text object of thisTable) --DOES NOT WORK, RESULTS IN 0
set rowCount to number of rows of thisTable
set columnCount to number of columns of thisTable
set colList to {}
repeat (columnCount) times
if (count of colList) is less than 1 then
set end of colList to "X[l]"
else
set end of colList to "X[m]"
end if
end repeat
set colString to colList as string
-- ## Deal with Pre-Table Code
select text object of (get cell from table thisTable row 1 column 1)
set preParaNum to (GetParagraph() of me)
set myRange to create range active document start (start of content of content of text object of paragraph preParaNum of active document) end ((end of content of content of text object of paragraph preParaNum of active document) - 1)
select myRange
set origContent to content of myRange
set preTableContent to "
\\begin{table}
\\centering
{\\extrarowsep=1mm
\\begin{tabu}{" & colString & "}
\\tabucline[.4mm,black]1"
set content of myRange to origContent & return & preTableContent
-- ## Deal with Post-Table Code
select text object of (get cell from table thisTable row rowCount column columnCount)
set postParaNum to (GetParagraph() of me) + 1 + 1 + 1
set myRange to create range active document start (start of content of content of text object of paragraph postParaNum of active document) end ((end of content of content of text object of paragraph postParaNum of active document))
select myRange
set origContent to content of myRange
--return origContent
set postTableContent to "\\tabucline[.4mm,black]1
\\end{tabu}}
\\end{table}"
set content of myRange to postTableContent & return & origContent
-- Add Line After Top Row (Heading Row)
select text object of (get cell from table thisTable row (2) column 1)
insert rows selection position above number of rows 1
select text object of (get cell from table thisTable row (2) column 1)
set content of selection to "\\tabucline[.1mm,black]1"
--end tell --doc
end repeat --table loop
end if --tables exist
--end tell --doc
end if -- if table
(*
set myTable to table 1 of the active document
set aRange to convert row to text (row 1 of myTable) ¬
separator separate by tabs
set style of aRange to "normal"
set rowContent to content of aRange
set rowItems to {}
set orig_delims to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 13 -- (a carriage return)
set rowContent_items to text items of rowContent
set rowItems to items of rowContent_items
set AppleScript's text item delimiters to orig_delims
repeat with thisItem in rowItems
set item thisItem to item thisItem & "&"
end repeat
set content of aRange to (rowItems as string)
*)
end tell
end texifyTables

on GetParagraph()
-- NOTE: If you select a paragraph including the first character of the first word, it will count up to the previous paragraph only!
tell application "Microsoft Word"
set myDoc to active document
set myRange to create range myDoc start 0 end (start of content of text object of selection)
set paragraphNum to (count paragraphs in myRange)
return paragraphNum
end tell
end GetParagraph

on openInTeXShop()
tell application "Microsoft Word"
get properties of settings
get RTF in clipboard of settings
tell active document
set paraCount to count of paragraphs
set myRange to create range start (start of content of text object of paragraph 1) end (end of content of text object of paragraph paraCount)
select myRange
end tell
if selection type of selection is selection normal then
copy object selection
end if
end tell
tell application "TeXShop"
make new document
activate
set thisDoc to the front document
(*
set preBody to "
\\documentclass[10pt]{article}
\\usepackage{fontspec}
\\begin{document}
"
set postBody to "
\\end{document}"
*)
set content of selection of thisDoc to (the clipboard)
end tell
end openInTeXShop

on saveWordDoc(inputPathAL)
tell application "Microsoft Word"
save as active document file name inputPathAL
end tell
end saveWordDoc

on duplicateDoc()
tell application "Microsoft Word"
tell active document
set paraCount to count of paragraphs
set myRange to create range start (start of content of text object of paragraph 1) end (end of content of text object of paragraph paraCount)
select myRange
end tell
if selection type of selection is selection normal then
copy object selection
make new document
paste object selection
return name of active document
end if
end tell
end duplicateDoc

--my closeWordDocByName(myDup, false)
on closeWordDocByName(docName, savingBOOL)
if savingBOOL is true then
tell application "Microsoft Word"
close document docName saving yes
end tell
else
tell application "Microsoft Word"
close document docName saving no
end tell
end if
end closeWordDocByName


on addTextToFrontOfDoc(preBody)
tell application "Microsoft Word"
tell active document
insert text preBody & return at beginning of text object of active document
end tell -- doc
end tell --prog
end addTextToFrontOfDoc


on addTextToEndOfDoc(postBody)
tell application "Microsoft Word"
tell active document
insert text postBody at end of text object of active document
end tell -- doc
end tell --prog
end addTextToEndOfDoc
9

This will be a follow-on to Aleksandr Blekh’s extremely helpful recommendation of pandoc (http://johnmacfarlane.net/pandoc) above, collecting the comments I’ve been adding on the tweaking necessary after running it, which seem to be outgrowing the limitations of a comment.

  • Wrap your title in \title{}, and add \maketitle.
  • Likewise \author{}, \date{}, \begin{abstract}.  
  • Nuke the parskip.sty block from orbit.  
  • I needed one \nopagebreak.
  • Replace (http[:\w\/\~\-\.]+[\w\/]+) with \\url\{\1\}
  • Search for genuine straight single quotes and replace them. \textquotesingle didn’t work for me; I ended up using \textsf{`} instead.
  • A couple of scary bits, because of TeX’s default behavior of hiding information when confused or potentially ugly:
    • Manually convert Unicode, e.g., ∴ => \therefore. (I normally use Stix with ucharclasses but didn’t try it for this.)  
    • Check tables carefully for missing cells. I ended up using makecell and manual line breaks. (I learned about p{24 pt} too late.)
    • raggedright was essential for my bibliography, which had some long URLs which TeX decided to keep on the same line even at the cost of cutting off bits.
1
  • I had a problem, I was importing word from a lawyer and one comment started and ended with a URL and threw a weird error message, but this fixed itself when I added the package hyperref and wrapped both URLs in \href. Commented Apr 6, 2017 at 10:38
6

This is probably a bit too late, but 350 pages of conversion is a lot. You could try the following tools people have suggested above such as WordtoLatex, writer2latex or rtf2latex2e, but I doubt you will be able to go through all 350 pages without any hassle. Especially with tables, images and all. It might though take you a month to do this carefully!

If you have completed all the 350 pages in word (man, that should have taken long!), then I'd recommend using one of the paid services available and just get it converted. You could try maybe Word to Latex, Word LaTeX or something similar although I agree it is hard to find one!

1
4

you can see the work "word2lyx" by Rob Oakes Importing Word Documents Into LyX (word2lyx 0.1)

3

word2tex seems like a pretty decent commercial option. Unfortunately, it only runs on Windows OS. It provides a "save as tex" option in the "Save As" dialog box. It also has dialog box that allows a wide range of configuration options.

That said, converting a manuscript from Word to TeX is the kind of task where it might be worth opening up a virtual machine or going to a Windows laptop to complete.

3

This website is in beta state but is constantly improving. If you follow all guidelines then you can get pretty descent ".tex" code and ".pdf". If you face any issues ,leave them a message and they will fix it.

Give it a try https://www.docx2latex.com/

3
  • No way to find out how it works without registering. There is no posted privacy policy, so I won't register. Do you upload docs? Is it an application you install? A plug in? Commented Sep 13, 2017 at 0:39
  • 1
    Previously they did not had registration system but now they do. Yes there is no privacy policy but I dropped them mail and confirmed that documents get deleted from server after downloads. In my experience it is best free online service I could find online.
    – aaryan
    Commented Sep 13, 2017 at 17:12
  • No need to register and there is also a section on privacy added docx2latex.com/docx2latex_free Worked like a charm for me.
    – Flo
    Commented Nov 10, 2019 at 0:26
2

The following website offers free online conversion from Word to LaTeX: http://www.word2latex.net/convert-ms-word-to-latex-online-automatically/

1

There is http://www.wordtolatex.com/. It is a result of the Bachelor thesis "Word-to-LaTeX convertor by Michal Kebrt. I was one of the early testers and it produced really good results. The free version of 1.2 from 2007 is still floating around the net: http://www.essential-freebies.de/board/viewtopic.php?t=14932

start screen

0

According to my experience, the best results are obtained with GrindEq (which is shareware, unfortunately). The resulting TeX document still requires a lot of work, but at least MathType equations are transformed correctly.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .